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RELATING TO BINDING PROTEINS FOR 
RECOGNITION OF DNA 

This application is the national phase of intern ational 
application PCT/GB95/01949, filed Aug. 17, 1995 which 
designated the U.S. 

FIELD OF THE INVENTION 

This invention relates inter alia to methods of selecting 
and designing polypeptides comprising zinc finger binding 
motifs, polypeptides made by the method(s) of the invention 
and to various applications thereof. 

BACKGROUND OF THE INVENTION 

Selective gene expression is mediated via the interaction 
of protein transcription factors with specific nucleotide 
sequences within the regulatory region of the gene. The most 
widely used domain within protein transcription factors 
appears to be the zinc finger (Zf) motif. This is an indepen- 
dently folded zinc-containing mini-domain which is used in 
a modular repeating fashion to achieve sequence-specific 
recognition of DNA (King 1993 Gene 135, 83-92). The first 
zinc finger motif was identified in the Xenopus transcription 
factor TFIIIA (Miller et al., 1985 EMBO J. 4, 1609-1614). 
The structure of Zf proteins has been determined by NMR 
studies (Lee et al., 1989 Science 245, 635-637) and crys- 
tallography (Pavletich & Pabo, 1991 Science 252, 809-812). 

The manner in which DNA-binding protein domains are 
able to discriminate between different DNA sequences is an 
important question in understanding crucial processes such 
as the control of gene expression in differentiation and 
development. The zinc finger motif has been studied 
extensively, with a view to providing some insight into this 
problem, owing to its remarkable prevalence in the eukary- 
otic genome, and its important role in proteins which control 
gene expression in Drosophila (e.g. Harrison & Travers 
1990 EMBO J. 9, 207-216), the mouse (Christy et al, 1988 
Proc. Natl. Acad. Sci. USA 85, 7857-7861) and humans 
(Kinzler et al., 1988 Nature (London) 332, 371). 

Most sequence-specific DNA-binding proteins bind to the 
DNA double helix by inserting an a-helix into the major 
groove (Pabo & Sauer 1992 Annu. Rev. Biochem. 61, 
1053-1095; Harrison 1991 Nature (London) 353, 715-719; 
and Klug 1993 Gene 135, 83-92). Sequence specificity 
results from the geometrical and chemical complementarity 
between the amino acid side chains of the a-helix and the 
accessible groups exposed on the edges of base-pairs. In 
addition to this direct reading of the DNA sequence, inter- 
actions with the DNA backbone stabilise the complex and 
are sensitive to the conformation of the nucleic acid, which 
in turn depends on the base sequence (Dickerson & Drew 
1981 J. Mol. Biol. 149, 761-786). A priori, a simple set of 
rules might suffice to explain the specific association of 
protein and DNA in all complexes, based on the possibility 
that certain amino acid side chains have preferences for 
particular base-pairs. However, crystal structures of protein- 
DNA complexes have shown that proteins can be idiosyn- 
cratic in their mode of DNA recognition, at least partly 
because they may use alternative geometries to present their 
sensory a-helices to DNA, allowing a variety of different 
base contacts to be made by a single amino acid and vice 
versa (Matthews 1988 Nature (London) 335, 294-295). 

Mutagenesis of Zf proteins has confirmed modularity of 
the domains. Site directed mutagenesis has been used to 
change key Zf residues, identified through sequence homol- 
ogy alignment, and from the structural data, resulting in 
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altered specificity of Zf domain (Nardelli et al., 1992 NAR 
26, 4137-4144). The authors suggested that although design 
of novel binding specificities would be desirable, design 
would need to take into account sequence and structural 

5 data. They state ''there is no prospect of achieving a zinc 
finger recognition code". 

Despite this, many groups have been trying to work 
towards such a code, although only limited rules have so far 
been proposed. For example, Desjarlais er al., (1992b PNAS 

10 89, 7345-7349) used systematic mutation of two of the three 
contact residues (based on consensus sequences) in finger 
two of the polypeptide Spl to suggest that a limited degen- 
erate code might exist. Subsequently the authors used this to 
design three Zf proteins with different binding specificities 

15 and affinities (Desjarlais & Berg, 1993 PNAS 90, 
2250-2260). They state that the design of Zf proteins with 
predictable specificities and affinities "may not always be 
straightforward". 

We believe the zinc finger of the THIIA class to be a good 

20 candidate for deriving a set of more generally applicable 
specificity rules owing to its great simplicity of structure and 
interaction with DNA The zinc finger is an independently 
folding domain which uses a zinc ion to stabilise the packing 
of an antiparallel (3-sheet against an a-helix (Miller et aL, 

25 1985 EMBO J. 4, 1609-1614; Berg 1988 Proc. Natl. Acad. 
Sci. USA 85, 99-102; and Lee et al, 1989 Science 245, 
635-637). The crystal structures of zinc finger-DNA com- 
plexes show a semiconserved pattern of interactions in 
which 3 amino acids from the a-helix contact 3 adjacent 

3n bases (a triplet) in DNA (Pavletich & Pabo 1991 Science 
252, 809-^817; Fairall et al, 1993 Nature (London) 366, 
483-487; and Pavletich & Pabo 1993 Science 261, 
1701-1707). Thus the mode of DNA recognition is princi- 
pally a one-to-one interaction between amino acids and 

35 bases. Because zinc fingers function as independent modules 
(Miller et al., 1985 EMBO J. 4, 1609-1614; Klug & Rhodes 
1987 Trends Biochem. Sci. 12, 464-469), it should be 
possible for fingers with different triplet specificities to be 
combined to give specific recognition of longer DNA 

40 sequences. Each finger is folded so that three amino acids 
are presented for binding to the DNA target sequence, 
although binding may be directly through only two of these 
positions. In the case of Zif268 for example, the protein is 
made up of three fingers which contact a 9 base pair 

45 contiguous sequence of target DNA. A linker sequence is 
found between fingers which appears to make no direct 
contact with the nucleic acid. 

Protein engineering experiments have shown that it is 
possible to alter rationally the DNA-binding characteristics 

so of individual zinc fingers when one or more of the a-helical 
positions is varied in a number of proteins (Nardelli et al., 
1991 Nature (London) 349, 175-178; Nardelli et al., 1992 
Nucleic Acids Res. 20, 4137-4144; and Desjarlais & Berg 
1992a Proteins 13, 272). It has already been possible to 

55 propose some principles relating amino acids on the a-helix 
to corresponding bases in the bound DNA sequence 
(Desjarlais & Berg 1992b Proc. Natl. Acad. Sci. USA 89, 
7345-7349). However in this approach the altered positions 
on the a-helix are prejudged, making it possible to overlook 

60 the role of positions which are not currently considered 
important; and secondly, owing to the importance of context, 
concomitant alterations are sometimes required to affect 
specificity (Desjarlais & Berg 1992b), so that a significant 
correlation between an amino acid and base may be mis- 

65 construed. 

To investigate binding of mutant Zf proteins, Tbiesen and 
Bach (1991 FEBS 283, 23-26) mutated Zf fingers and 
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studied their binding to randomised oligonucleotides, using Further, whilst allocation of amino acids at the designated 

elcctrophoretic mobility shift assays. Subsequent use of "random" positions may be genuinely random, it is preferred 

phage display technology has permitted the expression of to avoid a hydrophobic residue (Phe, Trp or Tyr) or a 

random libraries of Zf mutant proteins on the surface of cysteine residue at such positions. 

bacteriophage. The three Zf domains of Zif268 ? with 4 5 Preferably the zinc ringer binding motif is present within 

positions within finger one randomised, have been displayed the context of other amino acids (which may be present in 

on the surface of filamentous phage by Rebar and Pabo finger proteins), so as to form a zinc finger (which 

(1994 Science 263, 671-673). The library was Chen sub- includes an antiparallel p-sheet). Further, the zinc finger is 

jected to rounds of affinity selection by binding to target preferably displayed as part of a zinc finger polypeptide, 

DNA oligonucleotide sequences in order to obtain Zf pro- io which polypeptide comprises a plurality of zinc fingers 

teins with new binding specificities. Randomised mutagen- b y an intervening linker peptide. Typically the library 

esis (at the same postions as those selected by Rebar & of se 1 uences 1S such Ihat the zinc finger polypeptide will 

Pabo) of finger 1 of Zif 268 with phage display has also been com P rise * wo or more zinc fingers of defined amino acid 

used by Jamieson et aL, (1994 Biochemistry 33, 5689-5695) sequence (generally the wild type sequence) and one /Joe 

to create novel binding specificity and affinity. 15 finger having a zmc bindin S motif randomised in the 

\x + i wj * 1 /ioncr» \t j -j 1 1 o • TTr, A manner defined above. It is preferred that the randomised 

positions randomised. Six tnplets were used in selections iwa ~>i.: u u i * - T- j- .1 . , , 

{.„, ... , „ _ * " . j „ DNA, which helps to increase the binding specificity of the 

but did not return fingers with any sequence biases; and 20 randomised fi £ b v y 

when the three triplets of the Zif268 binding site wore tw~.ki., a ' j .u j • Jt - 

individually used as controls, the vast majority of selected thc .^H. 008 ena f * 6 bmdm§ 

fingers did not resemble the sequences of the wild-tvpe " 0tlf ° f ^ fin S er , of polypeptide. 

Zif268 fingers and, though capable of tight binding to their Conveniently , the sequences also encode mose amm o acids 

target sites in vitro, were usually not able to discriminate 2s germinal and C-tennmal of the middle finger m wild type 

strongly against different triplets. The authors interpret the , S which ( enc ° de ^ and third zinc fingers respec- 

results as evidence against the existence of a code ! ° * P ^f sequence encodes the 

. . . . .„ whole of the Zif268 polypeptide. Those skilled in the art will 

In summary it IS known Out Zf protein motifs are appreciate that alterations may also be made to the sequence 

widespread in DNA binding proteins and that binding is via of the IinkBr peptide and/or the p. sheet of the zinc fi 

three key ammo acids, each one contacting a single base pair polypeptide 

iitf^Z?^ m0 ^t r Md ^ bC 111 a &rther as P ect ' the Mention provides a library of 

linked together to form a set of fingen which recognise a DNA sequences, each sequence encoding the zinc fiLer 

contiguous DNA sequence (e^. a three fingered protein will bindin g^ otif of at least q a middle fi g Q , zinc «g 

recognise a 9 mer etc). The key residues involved in DNA bindi polypeptide for & u 0D a ° viral artid f he 

binding have been tdentified through sequence data and S ^ ciSng for the bmding motif having random 

torn structural mformation. Directed and random mutagen- aUocation o£ am ? Q adds a , ^ J 

esis has confirmed the role of these amino acids in deter- r< , ^- flT , t « . ~ r , + . , ' ' -VZTJ 

mining specificity and affinity. Phage display has been used C "» tl * ^ zmc fin ^ P^cptidc wall be Zi£268 

to screen for new binding specificities of random mutants of ■ ^^Uy, the sequences of either library are such that the 

fingers. A recognition code, to aid design of new finger 40 finger bmd ^ g d °f ain * cl ?™« as a *f on ™* 

specificities, has been worked towards although it has been * e minor +i coat P rot ^ (pHI) oi bacteriophage rd. 

suggested that specificity may be difficult to predict. Conveniently, the encoded polypeptide includes the tripep- 

* tide sequence Met-Ala-Glu as thc 1ST terminal of the zinc 

SUMMARY OF THE INVENTION fin 2 er domaia > which is known to allow expression and 

t 45 display using the bacteriophage fd system. Desirably the 

In a first aspect the invention provides a library of DNA library comprises 10 6 or more different sequences (ideally, 

sequences, each sequence encoding at least one zinc finger as many as is practicable) 

binding motif for display on a viral particle, the sequences In another t the kvendon ^ a method Qf 

coding for zinc finger binding motifs having random alio- designing a zinc finger po i ypeptide for binding t0 a ticu _ 

cation of ammo acids at positions -I, +2, + 3, + 6 and at least 5U lar target DNA ce> comprising screermi J ch of a 

M one of positions +1, + 5 and + 8. plurality of ^ finger binding motife against & at least „ 

A zmc finger binding motif is the a-helical structural effective portion of the target DNA sequence and selecting 

motif found in zinc finger binding proteins, well known to those motife which bind to the target DNA sequence. An 

those skilled in the art. The above numbering is based on the effective portion of the target DNA sequence is a sufficient 

fmrt amino acid in the a-helix of the zinc finger bmding 55 length of DNA to allow binding of the zinc binding motif to 

moti f being position +1. It will be apparent to those skilled the DNA. This is the minimum sequence information 

in the art that the amino acid residue at position -1 does not, (concerning thc target DNA sequence) that is required 

strictly speaking, form part of the a-helix of the zinc binding Desirably at least two, preferably three or more, rounds of 

finger motif. Nevertheless, the residue at -1 is shown to be screening are performed 

very important functionally and is therefore considered as 60 The invention also provides a method of designing a zinc 
part of the bindmg motif a-helix for the purposes of the finger polypeptide for binding to a particular target DNA 
present mvention. sequence, comprising comparing thc binding of each of a 
The sequences may code for zinc finger binding motifs plurality of zinc finger binding motife to one or more DNA 
having random allocation at all of positions +1, +5 and +8. triplets, and selecting those motifs exhibiting preferable 
The sequences may also be randomised at other positions 65 binding characteristics. Preferably the method defined 
(e.g. at position +9, although it is generally preferred to immediately above is preceded by a screening step accord- 
retain an arginine or a lysine residue at this position). ing to the method defined in the previous paragraph. 
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It is thus preferred that there is a two-step selection can result in the selection of DNA-binding proteins with 

procedure: the first step comprising screening each of a reasonable affinity but without specificity for a given DNA 

plurality of zinc finger binding motifs (typically in the form sequence. Therefore, in order to minimise these non-specific 

of a display library), mainly or wholly on the basis of affinity interactions when designing a polypeptide, selections should 

for the target sequence; the second step comprising com- 5 preferably be performed with low concentrations of specific 

paring binding characteristics of those motifs selected by the binding site in a background of competitor DNA, and 

initial screening step, and selecting those having preferable binding should desirably take place in solution to avoidlocal 

binding characteristics for a particular DNA triplet. concentration effects and the avidity of multivalent phage 

Where the plurality of zinc finger binding motifs is £° T ligands immobilised on solid surfaces, 

screened against a single DNA triplet, it is preferred that the 10 As a safeguard against spurious selections, the specificity 

triplet is represented in the target DNA sequence at the of individual phage should be determined following the final 

appropriate postion. However, it is also desirable to compare round of selection, instead of testing for binding to a small 

the binding of the plurality of zinc binding motifs to one or number of binding sites, it would be desirable to screen all 

more DNA triplets not represented in the target DNA possible DNA sequences. 

sequence (e.g. differing by just one of the three base pairs) 15 It has Q0W been shown bv the preseat fov^^ 

in order to compare the specificity of binding of the various (below) to design a truly modular zinc binding polypeptide 

binding motifs. The plurality of zinc ringer binding motifs wherein the zinc binding motif of each zinc binding finger 

may be screened against all 64 possible permutations of 3 is selected on the basis of its affinity for a particular triplet. 

DNA bases. Accordingly, it should be well within the capability of one 

Once suitable zinc finger binding motifs have been iden- 20 of normal skill in the art to design a zinc finger polypeptide 

rifled and obtained, they will advantageously be combined in capable of binding to any desired target DNA sequence 

a single zinc finger polypeptide. Typically this will be simply by considering the sequence of triplets present in the 

accomplished by use of recombinant DNA technology; target DNA and combining in the appropriate order zinc 

conveniently a phage display system may be used. fingers comprising zinc finger binding motifs having the 

In another aspect, the invention provides a DNA library 25 necessary binding characteristics to bind thereto. The greater 

consisting of 64 sequences, each sequence comprising a the length of known sequence of the target DNA, the greater 

different one of the 64 possible permutations of three DNA | De number of zinc finger binding motife that can be included 

bases in a form suitable for use in the selection method in me ZU1C finger polypeptide. For example, if the known 

defined above. Desirably the sequences are associated, or sequence is only 9 bases long then three zinc finger binding 

capable of being associated, with separation means. 30 motifs can be included in the polypeptide. If the known 

Advantageously, the separation means is selected from one sequence is 27 bases long then, in theory, up to nine binding 

of the following: microtitre plate; magnetic beads; or affinity molik could be included in the polypeptide. The longer the 

chromatography column. Conveniently the sequences are target DNA sequence, the lower the probability of its occur- 

biotinylatcd. Preferably the sequences arc contained within rence in aTlv g iven portion of DNA. 

12 mini-libraries, as explained elsewhere. 35 Moreover, those motifs selecled for inclusion in the 

In a further aspect the invention provides a zinc finger polypeptide could be artificially modified (e.g. by directed 

polypeptide designed by one or both of the methods defined mutagenesis) in order to optimise further their binding 

above. Preferably the zinc finger polypeptide designed by characteristics. Alternatively (or additionally) the length and 

the method comprises a combination of a plurality of zinc 40 amino acid sequence of the linker peptide joining adjacent 

fingers (adjacent zinc fingers being joined by an intervening zmc binding fingers could be varied, as outlined above. This 

linker peptide), each finger comprising a zinc finger binding niay nave tne effect of altering the position of the zinc finger 

motif. Desirably, each zinc finger binding motif in the zinc binding motif relative to the DNA sequence of interest, and 

finger polypeptide has been selected for preferable binding thereby exert a further influence on binding characteristics, 

characteristics by the method defined above. The interven- 45 Generally, it will be preferred to select those motifs 

ing linker peptide may be the same between each adjacent having high affinity and high specificity for the target triplet, 

zinc finger or, alternatively, the same zinc finger polypeptide In a further aspect, the invention provides a kit for making 

may contain a number of different linker peptides. The a zinc finger polypeptide for binding to a nucleic acid 

intervening linker peptide may be one that is present in sequence of interest, comprising: a library of DNA 

naturally-occurring zinc finger polypeptides or may be an so sequences encoding zinc finger binding motifs of known 

artificial sequence. In particular, the sequence of the inter- binding characteristics in a form suitable for cloning into a 

vening linker peptide may be varied, for example, to opti- vector; a vector molecule suitable for accepting one or more 

nnse binding of the zinc finger polypeptide to the target sequences from the library; and instructions for use. 

sequence. Preferably the vector is capable of directing the expres- 

Where the zinc finger polypeptide comprises a plurality of 55 sion of the cloned sequences as a single zinc finger polvpep- 

zmc binding motifs, it is preferred that each motif binds to tide. In particular it is preferred that the vector is capable of 

those DNA triplets which represent contiguous or substan- directing the expression of the cloned sequences as a single 

tially contiguous DNA m the sequence of interest. Where zinc finger polypeptide displaved on the surface of a viral 

several candidate binding motifs or candidate combinations particle, typically of the sort of viral display particle which 

of motifs exist, these may be screened against the actual 60 are known to those skilled in the art. The DNA sequences are 

target sequence to determine the optimum composition of preferably in such a form that the expressed polypeptides are 

the polypeptide. Competitor DNA may be included in the capable of self-assembling into a number of zinc finder 

screening assay for comparison, as described below. polypeptides. ~ & 

The non-specific component of all protein-DNA It will be apparent that the kit defined above will be of 
interactions, which includes contacts to the sugar-phosphate 65 particular use in designing a zinc finger polypeptide corn- 
backbone as well as ambiguous contacts to base-pairs, is a prising a plurality of zinc finger binding motifs, the binding 
considerable driving force towards complex formation and characteristics of which are already known. In another 
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aspect the invention provides a kit for use when zinc finger 
binding motifs with suitable binding characteristics have not 
yet been identified, such that the invention provides a kit for 
making a zinc finger polypeptide for binding to a nucleic 
acid sequence of interest, comprising: a library of DNA 5 
sequences, each encoding a zinc finger binding motif in a 
form suitable for screening and/or selecting according to the 
methods defined above; and instructions for use. 
Advantageously, the library of DNAsequences in the kit will 
be a library in accordance with the first aspect of the 10 
invention. Conveniently, the kit may also comprise a library 
of 64 DNAsequences, each sequence comprising a different 
one of the 64 possible permutations of three DNA bases, in 
a form suitable for use in the selection method defined 
previously. Typically, the 64 sequences are present in 12 15 
separate mini-libraries, each mini-library having one postion 
in the relevant triplet fixed and two postions randomised. 
Preferably, the kit will also comprise appropriate buffer 
solutions, and/or reagents for use in the detection of bound 
zinc fingers. The kit may also usefully include a vector 20 
suitable for accepting one or more sequences selected from 
the library of DNA sequences encoding zinc finger binding 
motifs. 

In a preferred embodiment, the present teaching will be 
used for isolating the genes for the middle zinc fingers 25 
which, having been previously selected by one of the 64 
triplets, are thought to have specific DNA binding activity. 
The mixture of genes specifying fingers which bind to a 
given triplet will be amplified by PCR using three sets of 
primers. The sets will have unique restriction sites, which 30 
will define the assembly of zinc fingers into three finger 
polypeptides. The appropriate reagents arc preferably pro- 
vided in kit form. 

For instance, the first set of primers might have Sfil and 
Agel sites, the second set Age! and EagI sites and third set 35 
EagI and NotI sites. It will be noted that the "first 57 site will 
preferably be Sfil, and the "last" site NotI, so as to facilitate 
cloning into the Sfil and NotI sites of the phage vector. To 
assemble a library of three finger proteins which recognise 
the sequence AAAGGGGGG, the fingers selected by the 40 
triplet GGG are amplified using the first two sets of primers 
and ligatcd to the fingers selected by the triplet AAA 
amplified using the third set of primers. The combinatorial 
library is cloned on the surface of phage and a nine base-pair 
site can be used to select the best combination of fingers en 45 
bloc. 

The genes for fingers which bind to each of the 64 triplets 
can be amplified by each set of primers and cut using the 
appropriate restriction enzymes. These building blocks for 
three-finger proteins can be sold as components of a kit for 
use as described above The same could be done for the 
library amplified with different primers so that 4- or 5-fingcr 
proteins could be built. 

Additionally a large (pre-assembled) library of all com- 55 
binations of the fingers selected by all triplets can also be 
developed for single-step selection of DNA-binding proteins 
using 9 bp, or much longer. DNA fragments. For this 
particular application, which will require very large libraries 
of novel 3-finger proteins, it may be preferable to use 60 
methods of selection other than phage display; for example 
stalled polysomes (developed by AfSmax) where protein and 
mRNA become linked. 

In a further aspect the invention provides a method of 
altering the expression of a gene of interest in a target cell, 65 
comprising; determining (if necessary) at least part of the 
DNA sequence of the structural region and/or a regulatory 
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region of the gene of interest; designing a zinc finger 
polypeptide to bind to the DNA of known sequence, and 
causing said zinc finger polypeptide to be present in the 
target cell, (preferably in the nucleus thereof). (It will be 
apparent that the DNA sequence need not be determined if 
it is already known.) 

The regulatory region could be quite remote from the 
structural region of the gene of interest (e.g. a distant 
enhancer sequence or similar). Preferably the zinc finger 
polypeptide is designed by one or both of the methods of the 
invention defined above. 

Binding of the zinc finger polypeptide to the target 
sequence may result in increased or reduced expression of 
the gene of interest depending, for example, on the nature of 
the target sequence (e.g. structural or regulatory) to which 
the polypeptide binds. 

In addition, the zinc finger polypeptide may advanta- 
geously comprise functional domains from other proteins 
(e.g. catalytic domains from restriction enzymes, 
rccombinascs, rcplicases, intcgrascs and the like) or even 
"synthetic" effector domains. The polypeptide may also 
comprise activation or processing signals, such as nuclear 
localisation signals. These are of particular usefulness in 
targtetting the polypeptide to the nucleus of the cell in order 
to enhance the binding of the polypeptide to an intranuclear 
target (such as genomic DNA). A particular example of such 
a localisation signal is that from the large T antigen of S V40. 
Such other functional domains/signals and the like are 
conveniently present as a fusion with the zinc finger 
polypeptide. Other desirable fusion partners comprise 
immunoglobulins or fragments thereof (eg. Fab, scFv) hav- 
ing binding activity. 

The zinc finger polypeptide may be synthesised in situ in 
the cell as a result of delivery to the cell of DNA directing 
expression of the polypeptide. Methods of facilitating deliv- 
ery of DNA are well-knowo to those skilled in the art and 
include, for example, recombinant viral vectors (e.g. 
retroviruses, adenoviruses), liposomes and the like. 
Alternatively, the zinc finger polypeptide could be made 
outside the cell and then delivered thereto. Delivery could be 
facilitated by incorporating the polypeptide into liposomes 
etc. or by attaching the polypeptide to a targetting moiety 
(such as the binding portion of an antibody or hormone 
molecule). Indeed, one significant advantage of zinc finger 
proteins over oligonucleotides or protein-nucleic acids 
(PNAs) in controlling gene expression, would be the vector- 
free delivery of protein to target cells. Unlike the above, 
many examples of soluble proteins entering cells are known, 
including antibodies to cell surface receptors. The present 
inventors are currently carrying out fusions of anti-bcr-abl 
fingers (see example 3 below) to a single-chain (sc) Fv 
fragment capable of recognising NIP (4-hydroxy-5-iodo-3- 
nitrophenyl acetyl). Mouse transferrin conjugated with NIP 
will be used to deliver the fingers to mouse cells via the 
mouse transferrin receptor. 

Media (e.g. microtitre wells, resins etc.) coated with NIP 
can also be used as solid supports for zinc fingers fused to 
anti-NIP scFvs, for applications requiring immobilised zinc 
fingers (e.g. the purification of specific nucleic acids). 

In a particular embodiment, the invention provides a 
method of inhibiting cell division by causing the presence in 
a cell of a zinc finger polypeptide which inhibits the expres- 
sion of a gene enabling the cell to divide. 

In a specific embodiment, the invention provides a 
method of treating a cancer, comprising delivering to a 
patient, or causing to be present therein, a zinc finger 
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polypeptide which inhibits the expression of a gene enabling 
the cancer cells to divide. The target could be. for example, 
an oncogene or a normal gene which is overexpressed in the 
cancer cells. 

To the best knowledge of the inventors, design of a zinc 5 
finger polypeptide and its successful use in modulation of 
gene expression (as described below) has never previously 
been demonstrated. This breakthrough presents numerous 
possibilities. In particular, zinc finger polypeptides could be 
designed for therapeutic and/or prophylactic use in regulat- w 
ing Lhe expression of disease-associated genes. For example, 
zinc finger polypeptides could be used to inhibit the expres- 
sion of foreign genes (e.g,. the genes of bacterial or viral 
pathogens) in man or animals, or to modify the expression 
of mutated host genes (such 15 

The invention therefore provides a zinc finger polypeptide 
capable of inhibiting the expression of a disease-associated 
gene. Typically the zinc finger polypeptide will not be a 
naturally-occurring polypeptide but will be specifically 
designed to inhibit the expression of the disease- associated 
gene. Conveniently the polypeptide will be designed by one 
or both of the methods of the invention defined above. 
Advantageously the disease-associated gene will be an 
oncogene, typically the BCR-ABL fusion oncogene or a ras 
oncogene. In a particular embodiment the invention pro- 
vides a zinc finger polypeptide designed to bind to the DNA 
sequence GCAGAAGCC and capable of inihibting the 
expression of the BCR-ABL fusion oncogene. 

In yet another aspect the invention provides a method of 
modifying a nucleic acid sequence of interest present in a 
sample mixture by binding thereto a zinc finger polypeptide, 
comprising contacting the sample mixture with a zinc finger 
polypeptide having affinity for at least a portion of the 
sequence of interest, so as to allow the zinc finger polypep- 
tide to bind specifically to the sequence of interest. 

The term "modifying" as used herein is intended to mean 
that the sequence is considered modified simply by the 
binding of the zinc finger polypeptide. It is not intended to 
suggest that the sequence of nucleotides is changed, 
although such changes (and others) could ensue following 
binding of the zinc finger polypeptide to the nucleic acid of 
interest. Conveniently the nucleic acid sequence is DNA. 

Modification of the nucleic acid of interest (in the sense 
of binding thereto by a zinc finger polypeptide) could be 
detected in any of a number of methods (e.g. gel mobility 
shift assays, use of labelled zinc finger polypeptides — labels 
could include radioactive, fluorescent, enzyme or biotin/ 
streptavidin labels). 

Modification of the nucleic acid sequence of interest (and 
detection thereof) may be all that is required (e.g. in diag- 
nosis of disease). Desirably however, further processing of 
the sample is performed. Conveniently the zinc finger 
polypeptide (and nucleic acid sequences specifically bound 
thereto) are separated from the rest of the sample. Advan- 
tageously the zinc finger polypeptide is bound to a solid 
phase support, to facilitate such separation. For example, the 
zinc finger polypeptide may be present in an acrylamidc or 
agarose gel matrix or, more preferably, is immobilised on the 
surface of a membrane or in the wells of a microtitre place. 

Possible uses of suitably designed zinc finger polypep- 
tides are: 

a) Therapy (e.g. targetting to double stranded DNA) 

b) Diagnosis (e.g. detecting mutations in gene sequences: 
the present work has shown that "tailor made" zinc finger 
polypeptides can distinguish DNA sequences differing by 
one base pair). 
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c) DNA purification (the zinc finger polypeptide could be 
used to purify restriction fragments from solution, or to 
visualise DNA fragments on a gel [for example, where the 
polypeptide is linked to an appropriate fusion partner, or is 
detected by probing with an antibody ]). 

In addition, zinc finger polypeptides could even be tar- 
geted to other nucleic acids such as ss or ds RNA (e.g. 
self-complementary RNA such as is present in many RNA 
molecules) or to RNA-DNA hybrids, which would present 
another possible mechanism of affecting cellular events at 
the molecular level. 

In Example I the inventors describe and successfully 
demonstrate the use of the phage display technique to 
construct and screen a random zinc finger binding motif 
library, using a defined oligonucleotide target sequence. 

In Example 2 is disclosed the analysis of zinc finger 
binding motif sequences selected by the screening procedure 
of Example 1, the DNA-specificity of the motifs being 
studied by binding to a mini-library of randomised DNA 
target sequences to reveal a pattern of acceptable bases at 
each position in the target triplet — a "binding site signature". 

In Example 3, the findings of the first two sections are 
used to select and modify rationally a zinc finger binding 
polypeptide in order to bind to a particular DNA target with 
high affinity: it is convincingly shown that the peptide binds 
to the target sequence and can modify gene expression in 
cells cultured in vitro. 

Example 4 describes the development of an alternative 
zinc finger binding motif library. 

Example 5 describes the design of a zinc finger binding 
polypeptide which binds to a DNA sequence of special 
clinical significance. 

BRIEF DESCRIPTION OF THE DRAWINGS 

The invention will now be further described by way of 
example and with reference to the accompanying drawings, 
of which: 

FIG. 1 is a schematic representation of affinity purification 
of phage particles displaying zinc finger binding motifs 
fused to phage coat proteins; 

FIG. 2 shows three zinc fingers (Seq ID No. 2) used in the 
phage display library; 

FIG. 3 shows the DNA sequences of three oligonucle- 
otides (Seq ID Nos. 3-8 ) used in the affinity purification of 
phage display particles; 

FIG. 4 is a "checker board" of binding site signatures 
determined for various zinc finger binding motifc (Seq ID 
Nos. 19-51); 

FIG. 5A-5F show graphs fractional saturation against 
concentration of DNA (nM) for various binding motifs and 
target DNA triplets; 

FIG. 6 shows the nucleotide sequence of the fusion 
between BCR and ABL sequences in pl90 cDNA (Seq ID 
No. 9) and the corresponding exon boundaries in the BCR 
and ABL genes (Seq ID Nos. 1G-11), 

FIG. 7 shows the amino acid sequences of various zinc 
finger binding motifs (Seq ID Nos. 12-17) designed to test 
for binding to the BCR/ABL fusion; 

FIG. 8 is a graph of peptide binding (as measured by 
A 4so-46o nm ) against DNA concentration (/M) of target or 
control DNA sequences; 

FIG. 9 is a graph showing percentage viability against 
65 time for various transfected cells; 

FIGS. 10A-10C and 11 illustrate schematically different 
methods of designing zinc finger binding polypeptides; and 
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FIG. 12 shows the amino acid sequence of zinc fingers in DOG1 (Hoogenboom er al., 1991 Nucleic Acids Res. 19, 
a polypeptide (Scq ID No. 18) designed to bind to a 4133^137) in which a section of the pelB leader and a 
particular DNA sequence (a ras oncogene). restriction site for the enzyme Sfil (underlined) have been 

added by site-directed mutagenesis using the oligonucle- 
EXAMPLE 1 5 otide (Seq ID No. 1): 

In this example the inventors have used a screening 5' CTCCTGCAGTTGGACCTGTGCCATGGCCG 

technique to study sequence-specific DNA recognition by GCTGGGCCGCAJAGAAl'GGAACAACrAAAGC 

zinc finger binding motifs. The example describes how a 3 1 

library of zinc finger binding motifs displayed on the surface ... 

of bacteriophage enables selection of fingers capable of 10 which anneals in the region of the polylinkcr, (L. Jespers, 
binding to given DNA triplets. The amino acid sequences of P ersonal communication). Electrocompetent DH5a cells 

selected fingers which bind the same triplet were compared ^? re transformed with recombinant vector in 200 ng 

to examine how sequence-specific DNA recognition occurs aliquots, grown for 1 hour in 2xTY medium with 1% 

The results can be rationalised in terms of coded interactions gl ^ C0 ^' and plated 011 TYE conlainin S 15 tetracycline 

between zinc fingers and DNA, involving base contacts 15 and ^ S* 110056 - 

from a few a-helical positions. ~ 2 shows the amino acid sequence (Seq ID No. 2) of 

An alternative to the rational but biased design of proteins ! h K e three ^ fingers from Zif268 used in the phage display 

with new specificities, is the isolation of desirable mutants ! bra f J^SV^ *° tt0m F ° WS r fF^ n " he ^ nce of 

from a large pool. A powerful method of selecting such ±C fct t a ^ 11111(1 respectively. The middle row 

proteins is the cloning of peptides (Smith 1985 Science 228, 20 JES V ^ ^ Sif « ^ u™" 

1315-1317), or protein domains (McCafferty el al., 1990 dom 56(3 P 05 ? 10 ? 5 * the a-helix of the middle linger have 

Nature (London) 348, 552-554; Bass et al., 1990 Proteins 8, Jf™ 8 ™ , * uT "? , Prions are nurn- 

309-314), as fusions to the minor coat protein (pill) of bered rela f e t0 the fet ^ hca[ 0 '^(position 1). For 

bacteriophage fd, which leads to their expression on the tip ammo ??? *? +8 ' excludl ?S the 

of the capsid. Phage displaying the peptides of interest can , 5 ^f\ and f codon f s areec l ual .™ es * (GAQNN: Tin 

then be affinity purified and amplified for use in further ^ ^ u^f • P fT » omitted in order to avoid stop 

rounds of selection and for DNA sequencing of the cloned ™ 6 °™S b ^ tblS h f J£* unfortl,nate effect th , at the codons for 

gene. The inventors applied this technology to the study of ^ t *u ^ ? Y \?r *t re P resented : + T 9 is 

zinc finger-DNA interactions after demonstrating that func- T I I 5? ^ ^ C L' A)G ' all ° Wing Clther Afg or Lys - 

tionalzmcfingerproteinscanbedisplayedonthesurfaceof Resi f ues of the hydrophobic core are circled whereas the 

fd phage, and that the engineered phage can be captured on 30 zmc + h S ands are witten as white letters on black circles. The 

a solid support coated with specific DNA. A phage display £° Sltl0nS fo nnmg the p -sheets and the a-helix of the zinc 

library was created comprising variants of the middle finger ^ §CtS ar f markcd below ihc s ^ uencc - 

from the DNA binding domain of Zif268 (a mouse tran- ^ 6eiectlon f j c 

scription factor containing 3 zinc fingers-Christy et al, 7 C ° l ™™™ G transferred from plates to 200 ml 2xTY/ 

1988). DNA of fixed sequence was used to punfy phage 35 (2xTY containing 50 ,,M Zn(CH3.C00) 2 and 15 

from this library over several rounds of selection, returning % & ™} tetra ^ cl ^ e ) and gmwn overmght. Phage were puri- 

a number of different but related zinc fingers which bind the ™ hom ™ S u , ^P^ant by two rounds of precipi- 

given DNA. By comparing similarities in the amino acid jation using ^volumes of 20% PEG/25M NaCl containing 

sequences of functionally equivalent fingers we deduce the T ^ u ^5B^ e ? ded m zinc fi ng<* 

likely mode of interaction of these fingers with DNA. 40 ft buff f g? ^¥ ™™ P ^V° ^ NaQ ' 1 mM 

Remarkably, it would appear that many base contacts can MgCl2 md . 5 ? *T Zn(CH3.C00) 2 ). Streptavidin-coated 

occur from three primary positions on the a-helix of the zinc Paramagnetic beads (Dynal) were washed in zinc finger 

finger, correlating (in hindsight) with the implications of the P ha ge buffer and blocked for 1 hour at room temperature 

crystal structure of Zif268 bound to DNA(Pavletich & Pabo th ?, same buffer made U P to 6% m fat ' frce dried 

1991). The ability to select or design zinc fingers with A< ( Marve1 )- Selector icf phage was over three rounds: in the 

desired specificity means that DNA binding proteins eon- ^ ™ , ' (1 m s) wcre saturated with biotinylated 

taining zinc fingers can now be "made-to-measure". oligonucleotide (-80 nM) and then washed prior to phage 

binding, but m the second and third rounds 1.7 nM oligo- 

MateriaLs and Methods nucleotide and 5 ,ug poly dGC (Sigma) were added to the 

beads with the phage. Binding reactions (1.5 ml) for 1 hour 

Construction and cloning of genes. The gene for the first su at 15° C. were in zinc finger phage buffer made up to 2% in 

three fingers (residues 3-101) of Transcription Factor IIIA fat-free dried milk (Marvel) and 1% in Tween 20 and 

(TFIIIA) was amplified by PCR from the cDNA clone of typically contained 5x10" phage. Beads were washed 15 

TFIIIA using forward and backward primers which contain times with 1 ml of the same buffer. Phage were eluted by 

restriction sites for LNotI and Sfil respectively. The gene for shaking in 0. 1M triethylatnine for 5 min and neutralised 

the Zif268 fingers (residues 333-420) was assembled from 55 with an equal volume of 1M Tris pH7.4. Log phase E. coli 

8 overlapping synthetic oligonucleotides, giving Sfil and TGI in 2xTY were infected with eluted phage for 30 min at 

NotI overhangs. The genes for fingers of the phage library 37° C. and plated as described above. Phage litres were 

were synthesised from 4 oligonucleotides by directional end determined by plating serial dilutions of the infected bacte- 

to end ligation using 3 short complementary linkers, and ria. 

amplified by PCR from the single strand using forward and The phage selection procedure, based on affinity 

backward primers which contained sites for Notl and Sfil 60 purification, is illustrated schematically in FIG. 1: zinc 

respectively. Backward PCR primers in addition introduced fingers (A) are expressed on the surface of fd phage(B) as 

Met-Ala-Glu as the first three amino acids of the zinc finger fusions to the the minor coat protein (C). The third finger is 

peptides, and these were followed by the residues of the wild mainly obscured by the DNA helix. Zinc finder phage are 

type or library fingers as discussed in the text. Cloning bound to 5'-biotinylated DNA oligonucleotide^ ] attached 

overhangs were produced by digestion with Sfil and Notl 65 to streptavidin-coated paramagnetic beads [E 1, and captured 

where necessary. Fragments were ligated to 1 ftg similarly using a magnet [F ]. (Figure adapted from Dynal AS and also 

prepared Fd-Tet-SN vector. This is a derivative of fd-tet- Marks et al. (1992 J. Biol. Chem. 267, 16007-16105) 
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FIG. 3 shows sequences (Seq ID No.s 3-8) of DNA 
oligonucleotides used to purify (i) phage displaying the first 
three Angers of TFIIIA, (ii) phage displaying the three 
fingers of Zif268, and (hi) zinc finger phage from the phage 
display library. The Zi£268 consensus operator sequence 
used in the X-ray crystal structure (Pavletich & Pabo 1991 
Science 252, 809-817) is highlighted in (ii), and in (hi) 
where "X" denotes a base change from the ideal operator in 
oligonucleotides used to purify phage with new specificities. 
Biotinylation of one strand is shown by a circled "B". 
Sequencing of Selected Phage 

Single colonies of transformants obtained after three 
rounds of selection as described, were grown overnight in 
2xTY/Zn/Tet. Small aliquots of the cultures were stored in 
15% glycerol at -20° C, to be used as an archive. Single- 
stranded DNA was prepared from phage in the culture 
supernatant and sequenced using the modified T7 RNA 
polymerase SEQUENASE™ 2.0 kit (U.S. Biochemical 
Corp.). 

Results and Discussion 

Phage display of 3-finger DNA-Binding Domains from 
TFIIIA or Zif268. Prior to the construction of a phage 
display library, the inventors demonstrated that peptides 
containing three fully functional zinc fingers could be dis- 
played on the surface of viable fd phage when cloned in the 
vector Fd-Tet-SN. In preliminary experiments, the inventors 
cloned as fusions to pill firstly the three N-terminal fingers 
from TFIIIA (Ginsberg et ah, 1984 Cell 39, 479^89), and 
secondly the three fingers from Zif268 (Christy et al., 1988), 
for both of which the DNA binding sites are known. Peptide 
fused to the minor coat protein was detected in Western blots 
using an anti-plil antibody (Stengele et al., 1990 J. Mol. 
Biol. 212, 143-149). Approximately 10-20% of total P-III 
in phage preparations was present as fusion protein. 

Phage displaying either set of fingers were capable of 
binding to specific DNA oligonucleotides, indicating that 
zinc fingers were expressed and correctly folded in both 
instances. Paramagnetic beads coated with specific oligo- 
nucleotide were used as a medium on which to capture 
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between 100 and 500-fold more such phage, compared to 
free beads or beads coated with non-specific DNA. 
Alternatively, when phage displaying the three fingers of 
Zif268 were diluted 1:1.7x10* with Fd-Tet-SN phage not 
bearing zinc fingers, and the mixture incubated with beads 
coated with Zif268 operator DNA, one in three of the total 45 
phage eluted and transfected into E. coll were shown by 
colony hybridisation to carry the Zi£268 gene, indicating an 
enrichment factor of over 500 for the zinc finger phage. 
Hence it is clear that zinc fingers displayed on fd phage are 
capable of preferential binding to DNA sequences with 
which they can form specific complexes, making possible 
the enrichment of wanted phage by factors of up to 500 in 
a single affinity purification step. Therefore, over multiple 
rounds of selection and amplification, very rare clones 
capable of sequence-specific DNA binding can be selcctcd- 
from a large library. 

A phage display library of zinc fingers from Zif268. The 
inventors have made a phage display library of the three 
fingers of Zif268 in which selected residues'in the middle 
finger are randomised (FIG. 2), and have isolated phage 
bearing zinc fingers with desired specificity using a modified 60 
Zif268 operator sequence (Christy & Nathans 1989 Proc. 
Natl. Acad. Sci. USA 86, 8737-8741) in which the middle 
DNA triplet is altered to the sequence of interest (TIG. 3). In 
order to be able to study both the primary and secondary 
putative base recognition positions which are suggested by 65 
database analysis (Jacobs 1992 EMBO J. It, 4507-4517), 
the inventors have designed the library of the middle finger 



so that, relative to the first residue in the a-helix (position 
+1), positions -1 to +8, but excluding the conserved Leu and 
His, can be any amino acid except Phe, Tyr, Trp and Cys 
which occur only rarely at those positions (Jacobs 1993 
Ph.D. thesis, University of Cambridge). In addition, the 
inventors have allowed position +9 (which might make an 
inter-finger contact with Ser at position -2 (Pavletich & 
Pabo 1991)) to be either Arg or Lys, the two most frequently 
occurring residues at that position. 

The logic of this protocol, based upon the Zif268 crystal 
structure (Pavletich & Pabo 1991), is that the randomised 
finger is directed to the central triplet since the overall 
register of protein-DNA contacts is fixed by its two neigh- 
bours. This allows the examination of which amino acids in 
the randomised finger are the most important in forming 
specific complexes with DNA of known sequence. Since 
comprehensive variations are programmed in all the putative 
contact positions of the a-helix, it is possible to conduct an 
objective study of the importance of each position in DNA- 
binding (Jacobs 1992). 

The size of the phage display library required, assuming 
full degeneracy of the 8 variable positions, is (16 7 x2*)=5.4x 
25 10 s , but because of practical limitations in the efficiency of 
transformation with Fd-Tet-SN, the inventors were able to 
clone only 2.6xl0 6 of these. The library used is therefore 
some two hundred times smaller than the theoretical size 
necessary to cover all the possible variations of the a-helix. 
Despite this shortfall, it has been possible to isolate phage 
which bind with high affinity and specificity to given DNA 
sequences, demonstrating the remarkable versatility of the 
zinc finger motif. 

35 Amino acid-base contacts in zinc finger-DNA complexes 
deduced from phage display selection. Of the 64 base triplets 
that could possibly form the binding site for variations of 
finger 2, the inventors have so far used 32 in attempts to 
isolate zinc finger phage as described. Results from these 
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DNA-binding phage, and were consistently able to return kolate zinc finger phage as described. Results from these 
between 100 and 500-fold more such phage, compared to 40 selections are shown in Table 1, which lists amino acid 



sequences (Seq ID Nos. 52-118) of the variant a-helical 
regions from clones of library phage selected after 3 rounds 
of screening with variants or the Zif268 operator. 

TABLE 1 



50 



55 



CAG 



TGA 



1 
9 

3 
1 
(3) 

2 
1 
1 
2 
1 

1 
1 
1 
1 
1 
1 
1 
1 

4 
1 



-|12^45$789 

Igd^lkIhik 
|sd|lt|hir 

|la|ls|hkr 

|kg$lt|hrk 

^GG^LV§HLR 

|gg^lg|hmk 
«rs§ll|htr 
$qs§ly|hqr 
£as$l,l|£hc;r 

i? % 
0rs§le|htr 
SqsSleIjhhr 

|QsiLVfHQR 
§GG|LG§HMK 

|ga|le|hrr 
§qg$lqJhgr 
|hp|ln|hlk 
|pg|lt|hgr 

|rs$le|htr 
Iha^la&htr 
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GCC 



GTC 



GCA 



GCT 



ACG 



ATG 



GTA 



TTG 



CCG 



GCG 



GTG 



2 
7 
1 

6 
1 

1 
2 
2 
1 

1 
1 
1 
1 
1 
1 
1 

8 
1 

a 
i 
i 

i 

i 

(i) 

(2) 

2 

1 
(1) 

9 
1 

5 
1 
2 
1 
1 

1 
3 
1 
3 
1 

1 
1 

1 
1 
1 
2 
1 



fRSpLTiHTR 

|rt^ls|hir 

Iag1lv|hSK 

$aq|'lq£hlx 



$gg8la|her 

III, 

|ka|la|hmk 

$AQ!§LQ$HLX 

Hrg|la^hek 

§RD$LAfoQK 

|gp^la||hgr 

" NR 




Shir 
|kd|ly|hvr 

^rd|lm|hir 

^GD^LT^HER 



§RSgLT&HTR 

|rt|ls|hir 

gAR|LT$HQR 

|gg|la|her 
|ra§la^hmr 
$rd$li|hsk 
||rg^la|her 




§SDgLKEHGK 

%gp|la|hgr 
$re$lq8htr 
|ed|li|hgk 
|sdElq§hhk 

111 
|ld|lr|hlk 

&Gd1&Lt|hER 

9ad|lm£hkr 

|lVDi|LE^HRR 

|rd|ll|hir 
|ed|li!hgk 
|sd|lq|hhk 
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In Table 1, the amino acid sequences, aligned in the one 
letter code, are listed alongside the DNA oligonucleotides (a 
to p) used in their purification. The latter are denoted by the 
sequence of the central DNA triplet in the "bound" strand of 
the variant Zif268 operator. The amino acid positions are 
numbered relative to the first helical residue (position 1), and 
the three primary recognition positions are highlighted. The 
accompanying numbers indicate the independent occur- 
rences of that clone in the sequenced population (5-10 
colonies); where numbers are in parentheses, the clone(s) 
were detected in the penultimate round of selection but not 
in the final round. In addition to the DNA triplets shown 
here, others were also used in attempts to select zinc finger 
phage from the library, but most selected two clones, one 
having the a-helical sequence KASNLVSHIR, (Seq ID No. 
119) and the other having the sequence LRHNLETHMR 
(Seq JD No. 120). Those triplets were: ACT, AAA, TTT, 

cct, err, rrc, agt, cga, cat, aga, agc and aax. 



In general the inventors have been unable to select zinc 
fingers which bind specifically to triplets without a 5' or 3' 
guanine, all of which return the same limited set of phage 
after three rounds of selection (see). However for each of the 
other triplets used to screen the library, a family of zinc 
finger phage is recovered. In these families is found a 
sequence bias in the randomised a-helix, which is inter- 
preted as revealing the position and identity of amino acids 
used to contact the DNA. For instance: the middle fingers 
from the 8 different clones selected with the triplet GAT 
(Table Id) all have Asn at position +3 and Arg at position +6, 
just as does the first zmc finger of the Drosophila protein 
Irarntrack in which they are seen making contacts to the 
same triplet in the cocrystal with specific DNA(Fairall et aL, 
15 1993). This indicates that the positional recurrence of a 
particular amino acid in functionally equivalent fingers is 
unlikely to be coincidental, but rather because it has a 
functional role. Thus using data collected from the phage 
display library (Table 1) it is possible to infer most of the 
specific amino acid-DNA interactions. Remarkably, most of 
the results can be rationalised in terms of contacts from the 
three primary a-helical positions (-1, +3 and +6) identified 
by X-ray crystallography (Pavletich & Pabo 1991) and 
database analysis (Jambs 1992). 

As has been pointed out before (Berg 1992 Proc. Natl. 
Acad. Sci. USA 89, 11109-11110), guanine has a particu- 
larly important role in zinc finger-DNA interactions. When 
present at the 5' (e.g. Table Ic-i) or 3' (e.g. Table lm-o) end 
of a triplet, G selects fingers with Arg at position +6 or -1 
of the a-helix respectively. When G is present in the middle 
position of a triplet (e.g. Table lb), the preferred amino acid 
at position +3 is His. Occasionally, G at the 5 f end of a triplet 
selects Ser or Thr at +6 (e.g. Table lp). Since G can only be 
specified absolutely by Arg (Secman ct al., 1976 Proc. Nat. 
Acad. Sci. USA 73, 804-808), this is the most common 
determinant at -1 and +6. One can expect this type of 
contact to be a bidentate hydrogen bonding interaction as 
seen in the crystal structures of Zi£2G8 (Pavletich & Pabo 
1991 Science 252, 809-817) and tramtrack (Fairall et al., 
1993). In these structures, and in almost all of the selected 
fingers in which Arg recognises G at the 3' end, Asp occurs 
at position +2 to buttress the long Arg side chain (e.g. Table 
lo,p). When position -1 is not Arg, Asp rarely occurs at +2, 
suggesting that in this case any other contacts it might make 
45 with the second DNA strand do not contribute significantly 
to the stability the protein-DNA complex. 

Adenine is also an important determinant of sequence 
specificity, recognised almost exclusively by Asn or Gin 
which again are able to make bidentate contacts (Seeman et 
50 al.. 1976). When A is present at the 3' end of a triplet, Gin 
is often selected at position -1 of the a-helix, accompanied 
by small aliphatic residues at +2 (e.g. Table lb). Adenine hi 
the middle of the triplet strongly selects Asn at +3 (e.g. Table 
Ic-e), except in the triplet CAG (Table la) which selected 
only two types of finger, both with His at +3 (one being the 
wild-type Zi£268 which contaminated the library during this 
experiment). The triplets ACG (Table lj) and ATG (Table 
lk), which have A at the 5' end, also returned oligoclonal 
mixtures of phage, the majority of which were of one clone 
with Asn at +6. 

In theory, cytosine and thymine cannot reliably be dis- 
criminated by a hydrogen bonding amino acid side chain in 
the major groove (Seeman et al., 1976). Nevertheless, C in 
the 3' position of a triplet shows a marked preference for Asp 
or Glu at position -1, together with Arg at +1 (e.g. Table 
leg.). Asp is also sometimes selected at +3 and +6 when C 
is in the middle (e.g. Table lo) and 5' (e-g. Table la) position 
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respectively. Although Asp can accept a hydrogen bond from Zif268 is randomised, and screened with tetranucleotides to 

the amino group of C, one should note that the positive take into account end effects such as additional contacts 

molecular charge of C in the major groove (Hunter 1 993 J. from variants of this finger. Onlv 4 positions (-1 +2 +3 and 

Mol. Biol 230, 1025-1054) will favour an interaction with +6 ) were randomised, chosen on the basis of the' earlier 

Asp regardless of hydrogen bonding contacts. 5 X-ray crystal structures. The results presented above, in 

However, C in the middle position most frequently selects which more positions were randomised, to some extent 

Thr (e.g. Table li), Val or Leu (e.g. Table lo) at +3. justifies Rebar and Pabo's use of the four random positions 

Similarly, T in the middle position most often selects Ser without apparent loss of effect, although further selections 

(e.g. Table li), Ala or Val (e.g. Table lp) at +3. The aliphatic may reveal that the library is compromised. However, ran- 

amino acids are unable to make hydrogen bonds but Ala 10 demising only four positions decreases the theoretical 

probably has a hydrophobic interaction with the methyl library size so that full degeneracy can be achieved in 

group of T, whereas a longer side chain such as Leu can practice. Nevertheless the inventors found that the results 

exclude T and pack against the ring of C. When T is at the obtained by Rebar and Pabo by screening their complete 

5' end of a triplet, Ser and Thr are selected at +6 (as is library with two variant Zi£268 operators, are in agreement 

occasionally the case for G at the 5' end). Thymine at the 3' 15 with their conclusions derived from an incomplete library, 

end of a triplet selects a variety of polar amino acids at -1 On the one hand this again highlights the versatility of zinc 

(e.g. Table Id), and occasionaUy returns fingers with Ser at fingers but, remarkably, so far both studies have been unable 

+2 (e.g. Table la) which could make a contact as seen in the to produce fingers which bind to the sequence CCT It will 

tramtrack crystal structure (Fairali et al, 1993). be interesting to see whether sequence biases such as we 

Limitations of Phage Display 2 0 have detected would be revealed, if more selections were 

From Table 1 it can be seen that a consensus or bias performed using Rebar and Pabo's library. In any case it 
usuaUy occurs in two of the three primary positions (-1, +3 would be desirable to investigate the effects on selections of 
and +6) for any family of equivalent fingers, suggesting that using different numbers of randomised positions in more 
in many cases phage selection is by virtue of only two base complete libraries than have been used so far. 
contacts per finger, as is observed in the Zi£268 crystal 25 The original position or context of the randomised finder 
structure (Pavletich & Pabo 1991). Accordingly, identical in the phage display library might bear on the efficacyof 
finger sequences are often returned by DNA sequences selected fingers when incorporated into a new DN A- binding 
differing by one base in the central triplet. One reason for domain. Selections from a library of the outer fingers of a 
this is that the phage display selection, being essentially three finger peptide (Rebar & Pabo, 1994 Science 263, 
purification by affinity, can yield zinc fingers which bind 30 671-673; Jamieson et al., 1994 Biochemistry 33' 
equally tightly to a number of DNA triplets and so arc unable 5689-5695) arc capable of producing fingers which bind 
to discriminate. Secondly, since complex formation is gov- DNA in various different modes, while selections from a 
erncd by the law of mass action, affinity selection can favour library of the middle finger should produce motifs which arc 
those clones whose representation in the library is greatest more constrained. Accordingly, Rebar and Pabo do not 
even though their true affinity for DNA is less than that of 35 assume that the first finger of Zif268 will always bind a 
other clones less abundant in the library. Phage display triplet, and screened with a tetranucieotide binding site to 
selection by affinity is therefore of limited value in dislin- allow for different b Ming modes. Tims motifs selected from 
guishing between permissive and specific interactions libraries of the outer fingers might prove less amenable to 
beyond those base contacts necessary to stabilise the com- the assembly of multifinger proteins, since binding of these 
plex. Thus in the absence of competition from fingers which 40 fingers could be perturbed on constraining them to a par- 
are able to bind specifically to a given DNA, the tightest ticular binding mode, as would be the case for fingers which 
non-specific complexes will be selected from the phage had to occupy the middle position of an assembled three- 
library. Consequently, results obtained by phage display finger protein. In contrast, motifs selected from libraries of 
selection from a library must be confirmed by specificity the middle finger, having been originally constrained will 
assays, particularly when that library is of limited size. 45 presumably be able to preserve their mode of binding'even 
Conclusion when placed in the outer positions of an assembled DNA- 

Tne ammo acid sequence biases observed within a family binding domain, 

of functionally equivalent zinc fingers indicate that, of the FIGS. 10A-10C shows different strategies for the design 

a-helical positions randomised in this study, only three of tailored zinc finger proteins. (A) A three-finger DNA- 

primary(-l, +3 and +6) and one auxiliary (+2) positions are so binding motif is selected en bloc from a library of three 

involved in recognition of DNA. Moreover, a limited set of randomised fingers. (B) A three-finger DNA-binding motif 

ammo acids are to be found at those positions, and it is is assembled out of independently selected fingers from a 

presumed that these make contacts to bases. The indications library of one randomised finger (e.g. the middle finger of 

therefore are that a code can be derived to describe zinc Zi£268). (C) A three-finger DNA-binding motif is assembled 

finger-DNA interactions. At this stage however, although 55 out of independently selected fingers from three positionally 

sequence homologies are strongly suggestive of amino acid specified libraries of randomised zinc fingers, 

preferences for particular base-pairs, one cannot confidently FIG. 11 illustrates the strategy of combinatorial assembly 

deduce such rules until the specificity of individual fingers followed by en bloc selection. Groups of triplet-specific zinc 

for DNA triplets is confirmed. The inventors therefore defer fingers (A) isolated by phage display selection are 

making a summary table of these preferences until the 60 assembled in random combinations and re-displayed on 

following example, in which is described how randomised phage (B) A full-length target site (C) is used to select en 

DNA bmdmg sites can be used to this end. bloc the most favourable combination of fingers (D) 

While this work was in progress, a paper by Rebar and 

Pabo was published (Rebar & Pabo 1994 Science 263, EXAMPLE 2 

671-673) in which phage display was also used to select 65 This example describes a new technique to deal efficiently 

zmc fingers with new DNA-binding specificities. These with the selection of a DNA binding site for a given zinc 

authors constructed a library in which the first finger of finger (essentiaUv the converse of example 1) This is 
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desirable as a safeguard against spurious selections based on with PBS/Zn. The "bound" strand of each oligonucleotide 
the screening of display libraries. This may be done by library was made synthetically and the other strand extended 
screening against libraries of DNA triplet binding sites from a 5" -bioLinylaled universal primer using DNA poly- 
randomised in two positions but having one base fixed in the merase I (Klenow fragment). Fill-in reactions were added to 
third position. The technique is applied here to determine the 5 wells (0.8 poiole DNA library in each) in PBS/Zn for 15 
specificity of fingers previously selected by phage display. minutes, then washed once with PBS/Zn containing 0.1% 
The inventors found that some of these fingers are able to Twcen, and once again with PBS/Zn. Overnight bacterial 
specify a unique base in each position of the cognate triplet. cultures each containing a selected zinc finger phage were 
This is further illustrated by examples of fingers which can Srown in 2xTY containing 50 mM Zn(CH3.C00) 2 and 15 
discriminate between closely related triplets as measured by 10 i " g/ ' ml tetrac y cuDe at 3 0° C. Culture supernatants containing 
their respective equilibrium dissociation constants. Compar- pha§e . T^lf^ tenfold b ^ the additio ° of PBS/Zn 
ing the amino acid sequences of fingers which specify a 2? ta T? 2%M4 ™ milk (Marvel), 1% Tween and 

amenawe to a code. Tween> aQd theQ 3 ^ ^ pBS/Zn ^ 

One can determine the optimal binding sites of these (and detected as described previously (Griffiths et at., 1994 

other) proteins, by selection from libraries of randomised EMBO J. In press), or using IIRP-conjugated anti-M13 IgG 

DNA. This approach, the principle of which is essentially (Pharmacia), and quantitated using software package SOFT- 

the converse of zinc finger phage display, would provide an 20 MAX 2.32 (Molecular Devices Corp), 
equally informative database from which the same rules can The results are shown in FIG. 4, which gives the binding 

be independently deduced. However until now, the favoured site signatures of individual zinc finger phage. The figure 

method for binding site determination (involving iterative represents binding of zinc finger phage to randomised DNA 

selection and amplification of target DNA followed by immobOised in the wells of microtitre plates. To test each 

sequencing), has been a laborious process not conveniently 25 zinc finger phage against each oligonucleotide library (see 

applicable to the analysis of a large database (Thiesen & above), DNAlibraries are applied to columns of wells (down 

Bach 1990 Nucleic Acids Res. 18, 3203-3209; Pollock & the plate), while rows of wells (across the plate) contain 

Treisman 1990 Nucleic Acids Res. 18, 6197-6204). equal volumes of a solution of a zinc finger phage. The 

This example presents a convenient and rapid new identity of each library is given as the middle triplet of the 

method which can reveal the optimal binding site(s) of a 30 " Dounil " strand of Zif268 operator, where N represents a 

DNA binding protein by single step selection from small mixture of all 4 nucleotides. The zinc finger phage is 

libraries and use this to check the binding site preferences of specified by the sequence of the variable region of the 

those zinc fingers selected previously by phage display. For middle finger, numbered relative to the first helical residue 

this application, the inventors have used 12 different mini- (position 1), and the three primary recognition positions are 

libraries of the Zif268 binding site, each one with the central 35 highlighted. Bound phage are detected by an enzyme immu- 

triplet having one position defined with a particular base pair noassay. The approximate strength of binding is indicated by 

and the other two positions randomised. Each library there- a £ r ey scale proportional to the enzyme activity. From the 

fore comprises 16 oligonucleotides and offers a number of pattern of binding to DNAlibraries, called the "signature" of 

potential binding sites to the middle finger, provided that the eacn clone , one or a small number of binding sites can be 

latter can tolerate the defined base pair. Each zinc finger 40 reac * °^ anc * these are written on the right of the figure, 

phage is screened against all 12 libraries individually immo- Determination of Apparent Equilibrium Dissociation Con- 

bilised in wells of a microtitre plate, and binding is detected stants 

by an enzyme immunoassay. Thus a pattern of acceptable Overnight bacterial cultures were grown in 2xTY/Zn/Tet 

bases at each position is disclosed, which the inventors term at 30 ° c - Culture supernatants containing phage were diluted 

a "binding site signatare" The information contained in a 45 twofold by the addition of PBS/Zn containing 4% fat-free 

binding site signature encompasses the repertoire of binding dried nnlk (Marvel), 2% Tween and 40 jug/ml sonicated 

sites recognised by a zinc finger. salmon sperm DNA. Binding reactions, containing appro- 

The binding site signatures obtained, using zinc finger priate concentrations of specific 5' -biotinylated DNA and 

phage selected as described in example 1, reveal that the equai volumes of zinc fi nger phage solution, were allowed 

selection has yielded some highly sequence-specific zinc 50 to ^nilibrate for 1 h at 20° C. All DNA was captured on 

finger binding motifs which discriminate at all three posi- strcptavidin-coatcd paramagnetic beads (500 /*g per well) 

tions of a triplet. From measurements of equilibrium disso- which were subs equently washed 6 times with PBS/Zn 

ciation constants it is found that these fingers bind tightly to containing 1% Tween and then 3 times with PBS/Zn. Bound 

the triplets indicated in their signatures, and discriminate pha S e were det ected using HRP-conjugated anti-M13 IgG 

against closely related siLes (usually bv at least a factor of 55 (Pharmacia) and developed as described (Griffiths et al., 

ten). The binding site signatures allow progress towards a 1994 )* °P tical densities were quantitated using software 

specificity code for the interactions of zinc fingers with package SOFTMAX 2,32 (Molecular Devices Corp). 
DNA. & 'The results are shown in FIGS 5A-5F, which is a series 

of graphs of fractional saturation against concentration of 

_ . 0 . 0 . Mate nals and Methods 60 DNA (nM). The two outer fingers carry the native sequence, 

Binding Site Signatures as do the the two cognate outer DNA triplets. The sequence 

Flexible flat-bottomed 96-wcll microtitre plates (Falcon) of amino acids occupying helical positions -1 to +9 of the 

^^nT^ at 4 ° C With stre P tavidin ( ai m &' ml var ^d finger are shown in each case. The graphs show that 

in 0.1M NaIIC0 3 pII8.6, 0.03% NaN 3 ). Wells were blocked the middle finger can discriminate closely related triplets 

for one hour with PBS/Zn (PBS, 50 fiM Zn (CH3.C00) 2 ) 65 usually by a factor of tea. The graphs allowed the determi- 

contaimng 2% fat-free dried milk (Marvel), washed 3 times nation of apparent equilibrium dissociation constants as 

with PBS/Zn containing 0.1% Tween, and another 3 times below. 
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Estimations of the K d are by fitting to the equation 
K^-[DNA ].[P ]/[DNA.P], using the software package 
KALEIDAGRAPH™ Version 2.0 programme (Abelbeck 
Software). Owing to the sensitivity of the ELISA used to 
detect protein-DNA complex, the inventors were able to use 5 
zinc finger phage concentrations far below those of the 
DNA, as is required for accurate calculations of the K^. The 
technique used here has the advantage that while the con- 
centration of DNA (variable) must be known accurately, that 
of the zinc fingers (constant) need not be known (Choo & 
Klug 1993 Nucleic Acids Res. 21, 3341-3346). This cir- 
cumvents the problem of calculating the number of zinc 
finger peptides expressed on the tip of each phage, although 
since only 10-20% of the gene III protein (pill) carries such 
peptides one would expect on average less than one copy per 
phage. Binding is performed in solution to prevent any 
effects caused by the avidity (Marks et al., 1992) of phage 
for DNA immobilised on a surface. Moreover, in this case 
measurements of by ELISA are made possible since 
equilibrium is reached in solution prior to capture on the 2Q 
solid phase. 

Results and Discussion 
The Binding Site Signature of the Second Finger of Zi£268 

The top row of FIG. 4 shows the signature of the second 
finger of wild type Zif268. From the pattern of strong signals 25 
indicating binding to oligonucleotide libraries having GNN, 
TNN, NGN and NNG as the middle triplet, it emerges that 
the optimal binding site for this finger is T/G,G,G, in accord 
with the published consensus sequence (Christy & Nathans 
1989 Proc. Natl. Acad. Sci. USA 86, 8737-8741). This has 30 
implications for the interpretation of the X-ray crystal struc- 
ture of Zif268 solved in complex with consensus operator 
having, TGG as the middle triplet (Pavletich & Pabo 1991). 
For instance, His at position +3 of the middle finger was 
modelled as donating a hydrogen bond to N7 of G, suggest- 35 
ing an equivalent contact to be possible with N7 of A, but 
from the binding site signature we can see that there is 
discrimination against A. This implies that the His may 
prefer to make a hydrogen bond to 06 of G or a bifurcated 
hydrogen bond to both 06 and N7, or that a steric clash with 40 
the amino group of A may prevent a tight interaction with 
this base. Thus by considering the stereochemistry of double 
helical DNA, binding site signatures can give insight into the 
details of zinc finger-DNA interactions. 
Amino Acid -base Contacts in Zinc Finger-DNA Complexes 45 
Deduced from Binding Site Signatures 

The binding site signatures of other zinc fingers reveal 
that the phage selections performed in example 1 yielded 
highly sequence-specific DNA binding proteins. Some of 
these are able to specify a unique sequence for the middle su 
triplet of a variant Zi£268 binding site, and are therefore 
more specific than is Zif268 itself for its consensus site. 
Moreover, one can identify the fingers which recognise a 
particular oligonucleotide library, that is to say a specific 
base at a defined position, by looking down the columns of 55 
FIG. 4. By comparing the amino acid sequences of these 
fingers one can identify any residues which have genuine 
preferences for particular bases on bound DNA. With a few 
exceptions, these are as previously predicted on the basis of 
phage display, and are summarised in Table 2. 60 

Table 2 summarises frequently observed amino acid-base 
contacts in interactions of selected zinc fingers with DNA. 
The given contacts comprise a "syllabic" recognition code 
for appropriate triplets. Cognate amino acids and their 
positions in the a-helix are entered in a matrix relating each 65 
base to each position of a triplet. Auxiliary amino acids from 
position +2 can enhance or modulate specificity of amino 
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acids at position -1 and these are listed as pairs. Ser or Thr 
at position +6 permit Asp +2 of the following finger (denoted 
Asp ++2) to specify both G and T indirectly, and the pairs 
are listed. The specificity of Ser +for T and Thr +3 for C may 
be interchangeable in rare instances while Val +3 appears to 
be consistently ambiguous. 



TABLE 2 





POSITION IN TRIPLET 
5' MIDDLE 


3" 




Arg +6 






G 


Sei +6/Asp ++2 


His +3 


Aig -1/Asp +2 




Thr +6/Asp ++2 






A 




Asn +3 


Gin -1/Ala +2 




Ser +6/Asp ++2 


Ala +3 


Asn -1 


T 


Thr +6/Asp -H-2 


Ser +3 


Gin -1/Ser +2 




Val 4-3 








Asp +3 






C 




I eu +3 


Asp -1 






Thr +3 








Val +3 





The binding site signatures also reveal an important 
feature of the phage display library which is important to the 
interpretation of the selection results. All the fingers in our 
panel, regardless of the amino acid present at position +6, 
are able to recognise G or both G and T at the 5' end of a 
triplet. The probable explanation for this is that the 5' 
position of the middle triplet is fixed as either G or T by a 
contact from the invariant Asp at position +2 of finger 3 to 
the partner of either base on the complementary strand, 
analogous to those seen in the Zif268 (Pavletich & Pabo 
1991 Science 252, 809-817) and tramtrack (Fairall et al., 
1993) crystal structures (a contact to NH 2 of C or A 
respectively in the major groove). Therefore Asp at position 
+2 of finger 3 is dominant over the amino acid present at 
position +6 of the middle finger, precluding the possibility of 
recognition of A or C at the 5' position. Future libraries must 
be designed with this interaction omitted or the position 
varied. Interestingly, given the framework of the conserved 
regions of the three fingers, one can identify a rule in the 
second finger which specifics a frequent interaction with 
both G and T, viz the occurrence of Ser or Thr at position +6, 
which may donate a hydrogen bond to either base. 
Modulation of Base Recognition by Auxiliary Positions 

As noted above, position +2 is able to specify the base 
directly 3* of the 'cognate triplet', and can thus work in 
conjunction with position +6 of Lhe preceding finger. The 
binding site signatures, whilst pointing to amino acid-base 
contacts from the three primary positions, indicate that 
auxiliary positions can play other parts in base recognition. 
A clear case in point is Gin at position -1, which is specific 
for A at the 3' end of a triplet when position +2 is a small 
non-polar amino acid such as Ala, though specific for T 
when polar residues such as Ser are at position +2. The 
strong correlation between Arg at position -1 and Asp at 
position +2, the basis of which is understood from the X-ray 
crystal structures of zinc fingers, is another instance of 
interplay between these two positions. Thus the amino acid 
at position +2 is able to modulate or enhance the specificity 
of the amino acid at other positions. 

At position +3, a different type of modulation is seen in 
the case of Thr and Val which most often prefer C in the 
middle position of a triplet, but in some zinc fingers are able 
to recognise both C and T. This ambiguity occurs possibly 
as a result of different hydrophobic interactions involving 
the methyl groups of these residues, and here a flexibility in 
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the inclination of the finger rather than an effect from Berg who have derived similar rules by altering zinc finger 

another position per se may be the cause of ambiguous specificity using database-guided mutagenesis (Dcsjarlais& 

reading. Berg 1992 Proc Natl. Acad. Sci. USA 89, 7345-7349; 

Quantitative Measurements of Dissociation Constants Desjarlais & Berg 1993 Proc. Natl. Acad. Sci. USA 90, 

The binding site signature of a zinc finger reveals its 5 2256-2260). 

differential base preferences at a given concentration of Combinatorial use of the Coded Contacts 

DNA. As the concentration of DNA is altered, one can The individual base contacts listed in Table 2, though part 

expect the binding site signature of any clone to change, of a code, may not always result in sequence specific binding 

being more distinctive at low [DNA], and becoming less so to the expected base triplet when used in any combination, 

at higher [DNA]0 as the K d of less favourable sites is 10 m the first instance one must be aware of the possibility that 

approached and further bases become acceptable at each zinc fingers may not be able to recognise certain combina- 

position of the triplet. Furthermore, because two base posi- tions of bases in some triplets by use of this code, or even 

tions arc randomly occupied in any one library of at all. Otherwise, the majority of inconsistencies may be 

oligonucleotides, binding site signatures are not formally accounted for by considering variations in the inclination of 

able to exclude the possibility of context dependence for 15 the trident reading head of a zinc finger with respect to the 

some interactions. Therefore to supplement binding site triplet with which it is interacting. It appears that the identity 

signatures, which are essentially comparative, quantitative of an amino acid at any one a-helical position is attuned to 

determinations of the equilibrium dissociation constant of the identity of the residues at the other two positions to allow 

each phage for different DNA binding sites are required. three base contacts to occur simultaneously. Therefore, for 

After phage display selection and binding site signatures, 20 example, in order that Ala may pick out T in the triplet GTG, 

these are the third and definitive stage in assessing the Arg must not be used to recognise G from position +6, since 

specificity of zinc fingers. this would distance the former too far from the DNA (see for 

Examples of such studies presented in FIGS 5A-5F show example the finger containing the amino acid sequence 

reveal that zinc finger phages bind the operators indicated in RGDALTSHER) (Seq ID No. 100). Secondly, since the 

their binding site signatures with K^s in the range of 10~ 8 25 pitch of the a-helix is 3.6 amino acids per turn, positions -1, 

-10" 9 M, and can discriminate against closely related bind- +3 and +6 are not an integral number of rums apart, so that 

ing sites by factors greater than an order of magnitude. position +3 is nearer to the DNA than are -1 or +6. Hence, 

Indeed FIGS. 5A-5F shows such differences in affinity for for example, short amino acids such as His and Asn, rather 

binding sites which differ in only one out of nine base pairs. than the longer Arg and Gin, are used for the recognition of 

Since the zinc fingers in our panel were selected from a 30 purines in the middle position of a triplet, 

library by non-competitive affinity purification, there is the As a consequence of these distance effects one might say 

possibility that fingers which are even more discriminatory that the code is not really "alphabetic" (always identical 

can be isolated using a competitive selection process. amino acid:basc contact) but rather ''syllabic" (use of a small 

Measurements of dissociation constants allow different repertoire of rules, but base contacts). An alphabetic code 

triplets to be ranked in order of preference according to the 35 would involve only four rules, but syllabicity adds an 

strength of binding. The examples here indicate that the additional level of complexity, since systematic combina- 

contacts from either position -1 or +3 can contribute to tions of rules comprise the code. Nevertheless, the recogni- 

discrimination. Also, the ambiguity in certain binding site tion of each triplet is still best described by a code of 

signatures referred to above can be shown to have a basis in syllables, rather than a catalogue of "logograms" 

the equal affinity of certain figures for closely related trip- 40 (idiosyncratic amino acid:base contact depending on triplet), 

lets. This is demonstrated by the K^s of the finger containing Conclusions 

the amino acid sequence RGDALTSHER (Seq ID No. 100) The "syllabic" code of interactions with DNA is made 

for the triple TTG and GTG. possible by the versatile framework of the zinc finger: this 

A code for zinc ringer-DNA recognition. One would allows an adaptability at the interface with DNA by slight 
expect that the versatility of the zinc finger motif will have 45 changes of orientation, which in turn maintains a stoichi- 
allowed evolution to develop various modes or binding to ometry of one coplanar amino acid per base-pair in many 
DNA (and even to RNA), which will be too diverse to fall different complexes. Given this mode of interaction between 
under the scope of a single code. However, although a code amino acids and bases it is to be expected that recognition 
may not apply to all zinc finger-DNA interactions, there is of G and A by Arg and Asn/Gln respectively are important 
now convincing evidence that a code applies to a substantial 5U features of the code; but remarkably other interactions can 
subset. This code will fall short of being able to predict be more discriminatory than was anticipated (Secman ct al., 
unfailingly the DNA binding site preference of any given 1976). Conversely, it is clear that degeneracy can be pro- 
zinc finger from its amino acid sequence, but may yet be grammed in the zinc fingers in varying degrees allowing for 
sufficiently comprehensive to allow the design of zinc intricate interactions with different regulatory DNA 
fingers with specificity for a given DNA sequence. 55 sequences (Harrison & Travers, 1990; Christy & Nathans, 

Using the selection methods of phage display (as 1989). One can see how this principle makes possible the 

described above) and of binding site signatures it is found regulation of differential gene expression by a limited set of 

that in the case of Zif268-like zinc fingers, DNA recognition transcription factors. 

involves four fixed principal (three primary and one As already noted above, the versatility of the finger motif 

auxiliary) positions on the a-helix, from where a limited and 60 will likely allow other modes of binding to DNA. Similarly, 

specific set of amino acid -base contacts result in recognition one must take into account the malleability of nucleic acids 

of a variety of DNA triplets. In other words, a code can such as is observed in Fairall ct al., where a deformation of 

describe the interactions of zinc fingers with DNA. Towards the double helix at a flexible base step allows a direct contact 

this code, one can propose amino acid-base contacts for from Ser at position +2 of finger 1 to a T at the 3 T position 

almost all the entries in a matrix relating each base to each 65 of the cognate triplet. Even in our selections there are 

position of a triplet (Table 2). Where there is overlap, the instances of fingers whose binding mode is obscure, and 

iesults presented here complement those of Desjarlais and may require structural analyses for clarification, lhus, water 
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may be seen to play an important role, for example where tion factors, recombinases, nucleases etc. for a wide range of 

short side chains such asAsp,Asn or Scr interact with bases applications. The inventors have shown that zinc finger 

from position -1 (Qian et al., 1993 J. Am. Chem. Soc. 115, mini-domains can discriminate between closely related 

1189-1190; Shakked et ah, 1994 Nature (London) 368, DNA triplets, and have proposed that they can be linked 

469-478). 5 together to form domains for the specific recognition of 

Eventually, it might be possible to develop a number of longer DNA sequences. One interesting possibility for the 

codes describing zinc finger binding to DNA, which could usc of sucn protein domains is to target selectively genetic 

predict the binding site preferences of some zinc fingers differences in pathogens or transformed cells. Here one such 

from their amino acid sequence. The functional amino acids apphcation is described. 

selected at positions -1, +3 and to an extent +6 in this study, 10 There exist a sel of human leukaemias in which a recip- 
arc very frequently observed at the same positions in natu- rocal chromosomal translocation t(9;22) (q34; qll) result in 
rally occurring fingers (e.g. see FIG. 4. of Desjarlais and a truncated chromosome 22, the Philadelphia chromosome 
Berg 1992 Proteins 12, 101-104) supporting the existence of ( Pnl )5, encoding al the breakpoint a fusion of sequences 
coded contacts from these three positions. However, the lack from me C "ABL protooncogene (Bartram et al, 1983 Nature 
of definitive predictive methods is not a serious practical 15 306 ' 277 -280) and the BCR gene (Groffen el al., 1984 Cell 
limitation as current laboratory techniques (here and in 36 > 93-99), In chronic myelogenous leukaemia (CML), the 
Thiesen & Bach 1990 and Pollock & Treisrnan 1990) will breakpoints usually occur in the first intron of the c-ABL 
allow the identification of binding sites for a given DNA- § cne and » thc breakpoint cluster region of the BCR gene 
binding protein Rather, one can apply phage selection and (Shtivelman et al., 1985 Nature 315, 550-554), and give rise 
a knowledge of the recognition rules to the converse 20 to a gene product (Konopka et aL, 1984 Cell 
problem, namely the design of proteins to bind predeter- 37 > 1035-1042). Alternatively, in acute lymphoblastic leu- 
mined DNA sites kaemia (ALL), the breakpoints usually occur in the first 
Prospects for the Design of DNA-binding Proteins introns of both BCR and c-ABL (Hermans er al., 1987 Cell 

The ability to manipulate the sequence specificity of zinc 51 > 33-40), and result in a pi go*"*" 4 " gen e product (FIG. 

fingers implies that we are on the eve of designing DNA- 25 6 ) (Kurzrock et al., 1987 Nature 325, 631-635). 

binding proteins with desired specificity for applications in & shows the nucleotide sequences (Seq ID No.s 

medicine and research (Desjarlais & Berg, 1993; Rebar & 9-11) of the fusion point between BCR and ABL sequences 

Pabo, 1 994). This is possible because, by contrast to all other in pl^O cDNA, and of the corresponding exon boundaries in 

DNA-binding motifs, we can avail ourselves of the modular tne BCR and c-ABL genes. Exon sequences are written in 

nature of the zinc finger, since DNA sites can be recognised 30 capital letters while introns are given in lowercase. Line 1 

by appropriate combinations of independently acting, fin- shows i^l9(f CR ABL cDNA; line 2 the BCR genomic 

gers linked in tandem. sequence at junction of exon 1 and intron 1; and line 3 thc 

Thc coded interactions of zinc fingers with DNA can be ABL genomic sequence at junction of intron 1 and exon 2 

used to model the specificity of individual zinc fingers de (Hermans et al 1987). The 9 bp sequence in thc pl90 ircK * 4ia ' 

novo, or more likely in conjunction with phage display 35 cDNA used as a target is underlined, as are the homologous 

selection of suitable candidates. In this way, according to sequences in genomic BCR and c-ABL. 

requirements, one could modulate the affinity for a given Facsimiles of these rearranged genes act as dominant 

binding site, or even engineer an appropriate degree of transforming oncogenes in cell culture (Daley et al., 1988) 

indiscrimination at particular base positions. Moreover, the and transgenic mice (Heisterkamp et al., 1990 Nature 344, 

additive effect of multiply repeated domains offers the 40 251-253). Like their genomic counterparts, the cDNAs bear 

opportunity to bind specifically and tightly to extended, and a unique nucleotide sequence at the fusion point of the BCR 

hence very rare, genomic loci. Thus zinc finger proteins and c-ABL genes, which can be recognised at the DNA level 

might well be a good alternative to the use of antisense by a site-specific DNA-binding protein. The present inven- 

nucleic acids in suppressing or modifying the action of a tors have designed such a protein to recognise the unique 

given gene, whether normal or mutant. To this end, extra 45 fusion site in the pigQpcx-ABL c _j)^ fn^ on ^ 0 bvi- 

functions could be introduced to these DNA binding ously distinct from the breakpoints in thc spontaneous 

domains by appending suitable natural or synthetic effectors. genomic translocations, which are thought to be variable 

EXAMPLE 3 among patients. Although the design of such peptides has 

implications for cancer research, the primary aim here is to 

From the evidence presented in the preceding examples, 50 prove the principle of protein design, and to assess the 

the inventors propose that specific DNA-binding proteins feasibility of in vivo binding to chromosomal DNA in 

comprising zinc fingers can be "made to measure". To available model systems. 

demonstrate their potential the inventors have created a three A nine base-pair target sequence (GCA, GAA, GCC) for 

finger polypeptide able to bind site-sped Really to a unique a three zinc finger peptide was chosen which spanned the 

9 bp region of a BCR-ABL fusion oncogene and to dis- 55 fusion point of the pl90* cat,AaL cDNA (Hermans et al., 

criminate it from the parent genomic sequences (Kurzrock et 1987). The three triplets forming this binding site were each 

al., 1988 N. Engl. J. Med. 319, 990 -998). Using trans- used to screen a zinc finger phage library over three rounds 

formed cells in culture as a model, it is shown that binding as described above m example 1. The selected fingers were 

to the target oncogene in chromosomal DNA is possible, then analysed by binding site signatures to reveal their 

resulting in blockage of transcription. Consequently, murine 60 preferred triplet, and mutations to improve specificity were 

cells made growth factor-independent by the action of the made to the finger selected for binding to GCA. A phage 

oncogene (Daley et al, 1988 Proc. Natl. Acad. Sci. U.S.A. display mini-library of putative BCR-ABL-binding thrce- 

85, 9312-9316) are found to revert to factor dependence on finger proteins was cloned in fd phage, comprising six 

transient transfection with a vector expressing the designed possible combinations of the six selected or designed fingers 

zinc finger polypeptide. 65 ( 1A? 1B . 2 A; 3A, 3B and 3C) linked in the appropriate order. 

DNA-binding proteins designed to recognise specific These fingers are illustrated in FIG 7 (Seq ID No.s 12-17). 

DNA sequences could be incorporated in chimeric transcrip- In FIG. 7 regions of secondary structure are underlined 
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below the list, while residue positions are given above, was capable of site-specific DNA-binding, in vivo. The 
relative to the first position of the ci-helix (position 1). Zinc peptide was fused to the VP16 activation domain from 
finger phages were selected from a library of 2.6x1 0 6 herpes simplex virus (Fields 1993 Methods 5, 116-1 24) and 
variants, using three DNA binding sites each containing one used in transient transfection assays (FIG. 9) to drive pro- 
of the triplets GCC, GAA or GCA. Binding site signatures 5 duction of a CAT (chloramphenicol acetyl transferase) 
(example 2) indicate that fingers 1A and IB specify the reporter gene from a binding site upstream of the TATA box 
triplet GCC, finger 2A specifics GAA, while the fingers (Gorman ct al, Mol. Cell. Biol. 2, 1044-1051). In detail, the 
selected using the triplet GCA all prefer binding to GOT. experiment was performed thus: reporter plasmids 
Amongst the latter is finger 3A, the specificity of which we pMCAT6BA, pMCAT6A, and pMCAT6B, were constructed 
believed, on the basis of recognition rules, could be changed by inserting 6 copies of the p^9(f CKnABL target site 
by a point mutation. Finger 3B, based on the selected finger (CGCAGAAGCC) (Seq ID No. 121), the c-ABL second 
3A, but in which Gin at helical position +2 was altered to Ala exon-intron junction sequence (TCCAGAAGCC) or the 
should be specific for GCA Finger 3C is an alternative BCR first exon-intron junction sequence (CGCAGGTGAG) 
version of finger 3A, in which the recognition of C is (Seq pj Na m) respectively, into pMCAD (Luscher et al., 
mediated by Asp +3 rather than by Thr +3. 1989 Genes Dev , i 50 7-1517). The anti-BCR-ABI/VP16 

The mini library was screened once with an oligonuclc- expression vector was generated by inserting the in-frame 

otide contaming the 9 base-pair BCR-ABL target sequence ^ betweeQ ^ ^Xion domain of herpes simplex 

to select for tight binding clones over weak bmders and V ui * ru«w* i noi\ ™i *u v « * a • *u 

background vector phage. Because the library was small, the ™™ ( ^ \ * I K ^ Z f 

inventors did not include competitor DNA sequences for f F ' B f T^^Si* ' t^f'^ 

homologous regions of the genomic BCR and c-ABL genes 20 Acids **> S } 2 }} C3H10T1/2 cells were transiently 

but instead checked the selected clones for their ability to co-transfected with 10 mg of reporter plasmid and 10 mg of 

discriminate. It was found that although all the selected expression vector. RSVL(de Wet ctaL, 1987 Mol. Cell Biol, 

clones were able to bind the BCR-ABL target sequence and 7 > 7 25-?37), which contains the Rous sarcoma virus long 

to discriminate between this and the genomic-BCR terminal repeat linked to lucifcrase, was used as an internal 

sequence, only a subset could discriminate against the 25 contro * t0 normalise for differences in transfection efifi- 

c-ABL sequence which, at the junction between intron 1 and ciency. Cells were transfected by the calcium phosphate 

exon 2, has an 8/9 base-pair homology to the BCR-ABL precipitation method and CAT assays performed as 

target sequence (Hermans et ai., 1987). Sequencing of the described (Sanchez- Garcia et al., 1993 EMBO J. 12, 

discriminating clones revealed two types of selected peptide, 4243-4250). Plasmid pGSEC, which has five consensus 

one with the composition 1A-2A-3B and the other with ^ 17-mer GAL4-ninding sites upstream from the minimal 

1B-2A-3B. Thus both peptides carried the third finger (3B) " promoter of the adenovirus Elb TATA box, and pMlVP16 

which was specifically designed against the triplet GCA but vector, which encodes an in-frame fusion between the 

peptide 1 A-2A-3B was able to bind to the BCR-ABL target DNA-binding domain of GAL4 and the activation domain of 

sequence with higher affinity than was peptide 1B-2A-3B. herpes simplex virus VP16, were used as a positive control 

The peptide 1A-2A-3B, henceforth referred to as the 35 (Sadowski et al, 1992 Gene 118, 137-141). 

anti-BCR-ABL peptide, was used in further experiments. C3H10T1/2 cells were transiently cotransfected with a 

The anti-BCR-ABL peptide has an apparent equilibrium CAT reporter plasmid and an anti-BCR-ABL/VP 16 expres- 

dissociation constant (K d ) of 6.2±0.4xl0" 7 M for the sion vector (pZNIA). 

^gQBCK-ABL c j)jgA sequence in vitro, and discriminates A specific (thirty-fold) increase in CAT activity was 

against the similar sequences found in genomic BCR and 40 observed in cells cotransfected with reporter plasmid bear- 

C-ABL DNA, by factors greater than an order of magnitude ing copies of the p i90 5Ci? " ABL cDNA target site, compared 

(FIG. 8). Referring to FIG. 8, (which illustrates discrimina- to a barely detectable increase in cells cotransfected with 

turn in the binding, of the anti-BCR-ABL peptide to its reporter plasmid bearing copies of cither the BCR or c-ABL 

pl90 i5C ^" /U5i target site and to like regions of genomic BCR semihomologous sequences, indicating in vivo binding, 

and c-ABL), the graph shows binding (measured as an 45 The selective stimulation of transcription indicates con- 

A 45o-6so) a t various [DNA]. Binding reactions and complex vincingly that highly site-specific DNA-binding can occur in 

detection by enzyme immunoassay were performed as vivo. However, while transient transfections assay binding 

described previously, and a full curve analysis was used in to plasmid DNA, the true target site for this and most other 

calculations of the K^Choo & Klug 1993). The DNA used DNA-binding proteins is in genomic DNA. This might well 

were oligonucleotides spanning 9 bp either side of the fusion 50 present significant problems, not least since this DNA is 

point in the cDNA or the exon boundaries. The anti-BCR- physically separated from the cytosol by the nuclear 

ABL peptide binds to its intended target site with a membrane, but also since it may be packaged within chro- 

K ( ^-»6.2±0.4xl0" 7 M, and is able to discriminate against matin. 

genomic BCR and c-ABL sequences, though the latter To study whether genomic targeting is possible, a con- 
differs by only one base pair in the bound 9 bp region. The 55 struct was made in which the anti-BCR-ABL peptide was 
measured dissociation constant is higher than that of three- flanked at the N-terminus with the nuclear localisation signal 
finger peptides from naturally occurring proteins such as Spl from the large T antigen of S V40 virus (Kalderon et al 1984 
(Kadonga et al., 1987 Cell 51, 1079-1090) or Zif268 Cell 499-509), and at the C-terminus with an 11 amino acid 
(Christy et al., 1988), which have K^s in the range of 10 9 M, c-myc epitope tag recognisable by the 9E1 0 antibodv (Evan 
but rather is comparable to that of the two fingers from the 6 o et al, 1985 Mol, Cell. Biol. 5, 3610-3616). This construct 
tramtrack (ttk) protein (Fairall et al., 1992). However, the was used to transiently transfect the IL-3-dependent murine 
affinity of the anti-BCR-ABL peptide could be refined, if cell line Ba/F3 (Palacios & Stcinmctz 1985 Cell 41, 
desired, by site-directed mutations or by " affinity matura- 727-734), or alternatively Ba/F3+pl 90 and Ba/F3+p210 cell 
tion"of a phage display Hbrary (Hawkins etal., 1992 J. Mol. fines previously made IL-3-independent by integrated plas- 
Bioi. 226, 889-896). 65 m ^ constructs expressing either pl90 BC/? ' A£L or p210 £?c *" 
Having, established DNA discrimination in vitro, the abjl, respectively. Staining of the cells with the 9E10 anti- 
inventors wished to test whether the anti-BCR-ABL peptide body followed by a secondary fluorescent conjugate showed 
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efficient nuclear localisation in those cells transfected with 
the anti-BCR-ABL peptide. 

The experimental details were as follows: the anti-BCR- 
ABL expression vector was generated in the pEF-BOS 
vector (Mizushima & Shigezaku 1990), including an 11 
amino acid c-myc epitope tag (EQKLISLEDLN) SEQ ID 
NO: 124 at the carboxy-terminal end, recognizable by the 
9E10 antibody (Evan et al, 1985) and the nuclear localiza- 
tion signal PKKKRKV SEQ ID NO: 125 of the large T 
antigen of SV40 virus (Kalderon et al., 1984) at the amino- 
terminal end. Three glycine residues were introduced down- 
stream of the nuclear localization signal as a spacer, to 
ensure exposure of ihe nuclear leader from the folded 
molecule. Ba/K3 cells were transfected with 25 mg of the 
anti-BCR-ABL expression construct tagged with the 9E10 
c-myc epitope as described (Sanchez-Garcia & Rabbitts 
1994 Proc. Natl. Acad. Sci. U.S.A. in press) and protein 
production analyzed 48 h later by immunofluorcsccnce- 
labelling as follows. Cells were fixed in 4% (w/v) paraform- 
aldehyde for 15 min, washed in phosphate-buffered saline 
(PBS), and permeabilized in methanol for 2 min. After 
blocking in 10% fetal calf serum in PBS for 30 min, the 
mouse 9E10 antibody was added. After a 30 min incubation 
at room temperature a fluorescein isolhiocyanate (FITC)- 
conjugated goat anti-mouse IgG (SIGMA) was added and 
incubated for a further 30 min. Fluorescent cells were 
visualized using a confocal scanning microscope 
(magnification, 200x). 

Immunofluorescence of Ba/F3+pl90 and Ba/F3+p2t0 
cells transiently transfected with the anti-bcr-abl expression 
vector and stained with the 9E10 antibody was done. 
Expression and nuclear localisation of the anti-BCR-ABL 
peptide was observed. In addition, transfected Ba/F3+pl90 
cells show chromatin condensation and nuclear fragmenta- 
tion into small apoptotic bodies, but not either untransfected 
Ba/F3+pl90 cells or transfected Ba/F3+p210 cells. 

The efficiency of transient transfection, measured as the 
proportion of immunofiuorescent cells in the population, 
was 15-20%. When IL-3 is withdrawn from tissue culture, 
a corresponding proportion of Ba/F3+pl90 cells are found to 
have reverted to factor dependence and die, while Ba/F3+ 
p210 cells are unaffected. The experimental details were as 
follows; cell lines Ba/F3, Ba/F3+pl90 and Ba/F3+p210 
were maintained in Dulbecco's modified Eagle's medium 
(DMEM) supplemented with 10% fetal bovine serum. In the 
case of Ba/F3 ceil line 10% WEHI-3B-conditioned medium 
was included as a source of IL-3. After the transfection with 
the anti-BCR-ABL expression vector, cells (5xlO s /ml) were 
washed twice in serum-free medium and cultured in DMEM 
medium with 10% fetal bovine serum without WEHI-3 
B-conditioned medium. Percentage viability was deter- 
mined by trypan blue exclusion. Data are expressed as 
means of triplicate cultures. The results are shown in graphi- 
cal form in FIG. 9. 

Immunofluorescence microscopy of transfected Ba/F3+ 
pl90 cells in the absence of IL-3 shows chromatin conden- 
sation and nuclear fragmentation into small apoptotic 
bodies, while the nuclei of Ba/F3+p210 cells remain intact. 
Northern blots of total cytoplasmic RNA from Ba/F3+pl90 
cells transiently transfected with the anti-BCR-ABL peptide 
revealed reduced levels of pl90 8CB,A * i mRNA relative to 
untransfected ceils. By contrast, similarly transfected 
Ba/F3+p210 cells showed no decrease in the levels of 
p2 i(f cr-abl m RNA(FIG. 12). The blots were performed as 
follows: 10 mg of total cytoplasmic RNA, from the cells 
indicated, was glyoxylated and fractionated in 1.4% agarose 
gels in 10 mM NaP0 4 buffer, pH 7.0. After electrophoresis 
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the gel was blotted onto HYBOND-N (Amersham), 
UV-cross linked and hybridized to an 32 P -labelled c-ABL 
probe. Autoradiography was for 14 h at -70° C. Loading was 
monitored by reprobing the filters with a mouse b-acting 
5 cDNA. 

Northern filter hybridisation analysis of Ba/F3+pl90 and 
Ba/F3+p210 cell lines transfected with the anti-BCR-ABL 
expression vector was done. When transfected with the 
anti-BCR-ABL expression vector, a specific downregulation 
io of pl9(f CRABL mRNA was seen in Ba/F3+pl90 cells, while 
expression of p210* CR,A * L was unaffected in Ba/F3+p210 
cells. 

In summary, the inventors have demonstrated that a 
DNA-binding protein designed to recognise a specific DNA 

15 sequence in vitro, is active in vivo where, directed to the 
nucleus by an appended localisation signal, it can bind its 
target sequence in chromosomal DNA. This Is found on 
otherwise actively transcribing DNA, so presumably bind- 
ing of the peptide blocks the path of the polymerase, causing 

20 stalling or abortion. The use of a specific polypeptide in this 
case to target intragenic sequences is reminiscent of anti- 
sense oligonucleotide- or ribozyme- based approaches to 
inhibiting the expression of selected genes (Stein & Cheng 
1993 Science 261, 1004-1012). Like aotisense 

25 

oligonucleotides, zinc finger DNA-binding proteins can be 
tailored against genes altered by chromosomal 
translocations, or point mutations, as well as to regulatory 
sequences within genes. Also, like oligonucleotides which 
can be designed to repress transcription by triple helix 

30 formation in homopurine-homopyrimidine promoters 
(Cooney ct al, 1988 Science 245, 725-730) DNA-binding 
proteins can bind to various unique regions outside genes, 
but in contrast they can direct gene expression by both up- 
or down-regulating, the initiation of transcription when 

35 fused to activation (Seipel er al., 1992 EMBO J. 11, 
4961-4968) or repression domains (Herschbach er al., 1994 
Nature 370, 309-311). In any case, by acting directly on any 
DNA, and by allowing fusion to a variety of protein 
effectors, tailored site-specific DNA-binding proteins have 

40 the potential to control gene expression, and indeed to 
manipulate the genetic material itself, in medicine and 
research. 

EXAMPLE 4 

45 

The phage display zinc finger library described in the 
preceding examples could be considered sub-optimal in a 
number of ways: 

i) the library was much smaller than the theoretical 
50 maximum size; 

li) the flanking fingers both recognised GCG triplets (in 
certain cases creating nearly symmetrical binding sites for 
the three zinc fingers, which enables the peptide to bind to 
the 'bottom 3 strand of DNA, thus evading the register of 
55 interactions we wished to set); 

iii) Asp+2 of finger three ("Asp++2") was dominant over 
the interactions of finger two (position+6) with the 5' base of 
the middle triplet; 

60 iv) not all amino acids were represented in the randomised 
positions. 

In order to overcome these problems a new three-finger 
library was created in which: 

a) the middle finger is fully randomised in only four 
65 positions (-1, +2, +3 and +6) so that the library size is 
smaller and all codons are represented. The library was 
cloned in the pCANTABSE phagemid vector from 
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Pharmacia, which allows higher transformation frequencies K d measurements show that the clone with Asn+3 nev- 

(han the phage. ertheless binds the mutant G12V sequence with a K, in the 

binding site. Recognition of the 3' A in the latter triplet by 5 sec l uence ' However it was predicted that Asn+3 should 
finger three is mediated by Gln-l/Ala+2, the significance of s P eci fy an adenine residue at the middle position, whereas 
which is thai the short Ala+2 should not make contacts to the the polypeptide we wished to make should specify a 
DNA (in particular with the 5' base of the middle triplet), cytosine for oiptimal binding, 
thus alleviating the problem noted at (iii) above. 

EXAMPLE 5 ™ 

. 1 bus we assembled a three-finger peptide with a Ser at 
The human ras gene is susceptible to a number of different position +3 of Finger 1 (as shown in FIG 12) again for 
mutations, which can convert it into an oncogene. A ras llsirjg synthetic oligos. This time the gene was Hgated to 

3; S^irM^t?T * 6 pCANTABSE phagemid. Transformants were isolated in the 

particular mutation is known as the G12V mutation (i.e. the v ^/.- adit p ._ . OA . 

polypeptide encoded by the mutant gene contains a substi! 15 £ "t^ Y 7 ( * " "? m * 3 ° 

tutionfUglycmetovaline).BecauLasoncogenesareso C ' f^f"** these conditions reduces the copy 

common in human cancers, they are extremely significant ? P f ^ S ° aS l ° ^ toX1C pr ° ductS less 

targets for potential therapeutic methods. abundant m the cells. 

A three finger protein has been designed which can 20 

recognise the G12V mutant of ras. The protein was produced ^ . _ m 

using rational design based on the known specificity rules. . ™ 6 ammo * c * d «quencc (Seq ID No. 18) of the fingers 

In outline, a zinc ringer framework (from one of the fingers 15 shown f mG * 12 ^ numbers refer lo the a-helical 

selected to bind GCC) was modified by point mutations in amm0 acid rcsiducs ' 7110 fin § crs ( F1 > F2 & F3 ) h ^ to the 

position +3 to yield fingers recognising two additional ^ G12V mutant nucleotlde sequence: 
different triplets. The finger recognising GCC and the two 

derivatives were cloned in pCANTABSE and expressed on 5 ' ^fj ~§ ^ 3 ' 
the surface of phage. 

Originally, the G12V-binding peptide "r-BP" was to be 

selected from a small library of related proteins. The reason 30 

a library was to be used is that while it was clear to us what ' Jt ' ne °°ld A shows the single point mutation by which the 

8/9 of the amino acid:basc contacts should be, it was not G12V sequence differs from the wild type sequence, 
clear whether the middle C of the GCC triplet should be 
recognised by +3 Asp, or Glu, or Ser, or Thr (see Table 2 

above). Thus a three-finger peptide gene was assembled 35 Assay of the protein in eukaryotes (e.g. to drive CAT 

from 8 overlapping synthetic oligonucleotides which were reporter production) requires the use of a weak promoter, 

annealed and ligated according to standard procedures and When expression of the anti-RAS (G12V) protein is strong' 

the -300 bp product purified from a 2% agarose gel. The the peptide presumably binds to the wild-type ras allele 

gene tor ringer 1 contained a partial codon randomisation at (which is required) leading to cell death. For this reason a 

ImJKS 40 re S ulatable P romoter (^ for tetracycline) will be used 'to 

ammo acids (D E, S & T) and also certain other residues deliver theprotein in therapeutic applications so that protein 

which were m fact not predicted to be desirable fe 2 Asn) *u • 1 "~ dU "" I *> MJ "^proicm 

tk. w r i r *• a j UC2aid ^ iC exceeds the Kd for the G12V point mutated gene but not the 

The synthetic oligonucleotides were designed to have Sfil Kd fnr ih& ^ w alulp ~. t . r * " 

and NotI overhangs when annealed. The -300 bp fragment , , aMe : SmCe tbe G } 2V mutatl0n 15 a 

was Hgated into Sfil/MotI -cut FdSN vector and the fixation 45 ™2E g ~ ™«" (™* 5^ a *NA 

mixture was electroporated into DH5a ceils. Phage were mut * h ° n aswasthepl 90 bcr-abl) human cell fines and other 

produced from these as previously described and a selection ammal modds Can be used 111 research ' 
step carried out using the G12 V sequence (also as described) 
to eliminate phage without insert and those phage of tbe 

library which bound poorly. ' 5U In ? ddltlon t0 repressing the expression of the gene, the 

Following selection, a number of separate clones were " pr ° tdn CaQ be ™ td to dia g nose me P^cise point mutation 

isolated and phage produced from these were screened by preSeDt m the S enomic DNA > or more in ^CR ampli- 

EL1SA for binding to the G12V ras sequence and discrimi- fied S enomic DNA > without sequencing. It should therefore 

nation against the wild-type ras sequence. A number of b f P ossmle > without further inventive activity, to design 

clones were able to do this, and sequencing of phage DNA 55 d^g* 0 ^ kits for detecting (e.g. point) mutations on DNA. 

later revealed that these fell into two categories, one of ELISA-based methods should prove particularly suitable, 
which had the amino acid Asn at the +3 randomised position, 
and another which had two other undesirable mutations. 

The appearance of Asn at position +3 is unexpected and 11 * s no P e(3 10 fuse tne Mnc ur >ger binding polypeptide to 

most probably due to the fact that proteins with a cytosine- 60 an scFv fragment which binds to the human transferrin 

specific residue at position +3 bind to some E. coli DNA receptor, which should enhance delivery to and uptake by 

sequence so tightly that they arc lethal. Thus phage display human cells. The transferrin receptor is thought particularly 

selection is not always guaranteed to produce the tightest- useful but, in theory, any receptor molecule (preferably of 

binding clone, since passage through bacteria is essential to high afiSnity) expressed on the surface of a human target cell 

the technique, and the selected proteins may be those which 65 could act as a suitable ligand, either for a specific immuno- 

do not bind to the genome of this host if such binding is globulin or fragment, or for the receptor's natural ligand 

deleterious. fosed Qr coupled ^ ^ zinc finger polvpe p tide 



